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Too close for comfort? 


Relationships between industry and researchers can be hard to define, but universities and other 
institutions must do more to scrutinize the work of their scientists for conflicts of interest. 


hat sort of industry connections could buy influence over 
W: scientist’s research results? Research grants as small as 

US$5,000? Money to support outreach that bolsters the 
industry's image? Equity in a spin-off company founded by the scientist? 
Defining what constitutes a conflict of interest — much less regulating 
it — continues to vex funding agencies, journals and institutions. Last 
month, for instance, Nature revealed that an activist organization had 
filed freedom-of-information requests to see the e-mails of research- 
ers who work on genetically modified crops (see Nature 524, 145-146; 
2015). Among other findings, their haul revealed that plant scientist 
Kevin Folta at the University of Florida in Gainesville had accepted a 
no-strings-attached $25,000 grant from the agriculture giant Monsanto. 

In his defence, Folta argued that the money supported only travel 
and outreach, not research, and he was therefore under no obligation 
to disclose it. This seems to be consistent with his institution’s guide- 
lines, and there is no evidence of any wrongdoing or that his research 
was compromised. 

Solar physicist Willie Soon, a climate-change sceptic at the Harvard- 
Smithsonian Center for Astrophysics in Massachusetts, also seems to 
have been operating within institutional policy when advocacy groups 
revealed in February that he had accepted more than $1 million from 
the energy industry, among other funders. (However, his failure to 
disclose those relationships might have violated the policies of some 
journals in which he published; see Nature http://doi.org/2jx (2015).) 

In trying to navigate such complexities, the US National Institutes 
of Health (NIH) has been ahead of the curve — presumably because of 
long-standing concerns about physicians’ industry relationships and the 
high stakes for protecting patients. Its parent agency, the Department of 
Health and Human Services (HHS), was the first to establish conflict-of- 
interest disclosure rules in 1995 and is still beyond many ofits counter- 
parts in maintaining unified regulations that include yearly reports to 
the government. By contrast, as one example, the US National Science 
Foundation’s grants policy suggests that institutions look to scientific 
societies for ideas on how to managea conflict of interest, and to report 
back to the foundation only if institutions cannot handle it themselves. 

But even the HHS rules were not enough to guarantee full transpar- 
ency. In 2009, a congressional report and subsequent media coverage 
found that some NIH-funded researchers had quietly accepted millions 
of dollars from industry. Again, the blame kept shifting: the universities 
said that the researchers had not reported the conflicts, the NIH received 
only bare-bones reports from institutions, and the researchers said that 
they did not know they were breaking any rules. 

The HHS updated its policies in 2011, but pleased no one. The 
government underestimated the time and money that institutions would 
spend implementing new rules. And some aspects of the reforms have 
proved to be window dressing: a Nature investigation this week reveals 
that these reforms have uncovered few conflicts of interest that would 
have escaped the original regulations (see page 300). 


The reforms may not be perfect, but they address real issues and others 
should take note. They make it clear that institutions are accountable, 
that they must educate their researchers on financial disclosure and that 
they should evaluate whether an industry relationship is problematic. 
The reforms also enlist a second pair of eyes by requiring institutions to 
report details of the conflict and its management to the NIH. Perhaps 
most importantly, the reforms remove the excuse of plausible deniability 

by clearly stating the kinds of financial rela- 


“The reforms tionship that could be considered conflicts. 

may not be One thing has become clear: conflicts are 
perfect, but slippery to define, so it is important for as 
others should many funders, institutions and journals to 


make as many demands as necessary. Had 
Kevin Folta been funded by the NIH, the HHS 
guidelines would have required him to report the Monsanto money. 
And if Willie Soon had had an NIH grant, his institution would have 
designed a ‘management plar that could have required his industry 
relationships to be stated in publications and lectures. 

The HHS rules could backfire. Institutions do not want the publicity 
and work that accompany an identified conflict. Because they hold the 
power to decide whether a relationship presents a conflict, they could 
theoretically give their researchers a pass. Nature’s investigation suggests 
that institutions use vastly different standards to evaluate such relation- 
ships, meaning that the rule is unevenly applied. And the current system 
makes it difficult for the public to access the conflict reports. 

Still, the HHS should be commended for at least attempting to 
address the problem, even if it was forced into doing so. Other funders 
and institutions could do worse than to learn from its successes and 
mistakes if they define and strengthen their own policies. = 


take note.” 


Mind meld 


Interdisciplinary science must break down 
barriers between fields to build common ground. 


cleaner repairs, and in the Czech Republic town of Kostelec nad 
Orlici, a business will sell you both wine and underwear. Such odd 
couplings are humorous because of their curiously limited scope. 


There is nothing 
| INTERDISCIPLINARITY 


funny, after all, 
ane t 

A Nature special issue apouPamae eestor 

nature.com/inter 


I: Castlegar, Canada, there is a golf shop that also offers vacuum- 


that repairs equip- 
ment and sells golf 
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clubs, wine, underwear and everything else under the Sun. 

The binary combinations also lead us to assume something about 
the shop’s owners. Faced with a specific set of circumstances, these 
businesses redefine what we expect from a shop and offer something 
distinct. 

There are greater problems in the world than what to do with your 
vacuum cleaner while you decide what make of balls to buy, but the 
principle is worth remembering as you browse this week’s special issue 
of Nature, which we dedicate to interdisciplinary science. 

Most scientists are aware of the term, and many will have used it. But 
how many are truly engaged in it? Done correctly, it is not mere multi- 
disciplinary work — a collection of people tackling a problem using their 
specific skills — but a synthesis of different approaches into something 
unique. It is the wine and underwear shop, not the hypermarket. 

The best interdisciplinary science comes from the realization that 
there are pressing questions or problems that cannot be adequately 
addressed by people from just one discipline. Witness the gathering 
of the scientific tribes — and the merging of approaches — for the 
Manhattan Project to work on the atomic bomb. More recently, Nature 
has reported on ‘implementation science} which combines medical 
expertise with local knowledge on how best to carry out programmes 
to improve public health (see Nature 523, 516-518; 2015). 

An interdisciplinary approach should drive people to ask questions 
and solve problems that have never come up before. But it can also 
address old problems, especially those that have proved unwilling to 
yield to conventional approaches. 

Enough of the rhetoric, what about the reality? It is hard to deny that 
the scientific system — from funding streams and academic rewards 
to university departments and journals — does not encourage much 


overlap between disparate subjects. It is easy to set up a ‘Centre for 
Interdisciplinary Research, but who will be prepared to join it? If 
governments, funders and universities want to encourage more basic 
researchers to leave their trenches, then they need to make the no- 
mans-land of interdisciplinarity a more welcoming place to build a 
career. The obstacles are many, as we discuss in the pages that follow. 
Some groups have found ways to overcome these obstacles, and 
some high-quality interdisciplinary work is 


“True under way. What are the key lessons from 
interdisciplinary _ these successes? 

science cannot Interdisciplinary science takes longer than 
be rushed.” conventional projects, and that makes it more 


expensive. Funders most accept and embrace 
this and hold their nerve if the pay-off from individual projects takes 
longer than expected. 

True interdisciplinary science cannot be rushed, not least because 
the best course of investigation is rarely clear at the outset. Research 
questions must be assessed and decided with input from all involved. 
An interdisciplinary project cannot exist as one main subject that 
sucks in the majority of the resources and leaves the partners as orbit- 
ing satellites. 

Communication is crucial. The varying use of language across disci- 
plines might seem a superficial problem, but it is one that must be solved, 
or misunderstandings will undermine the foundations of the project. 
There must also be no hierarchy, or perceived hierarchy. All involved 
must be confident that colleagues from other disciplines use equal aca- 
demic rigour and scientific standing, even if the methods used in rival 
fields seem alien. It takes time to see the value in other approaches. It 
takes an open mind to appreciate an appliance-mending golf shop. m 


Protection priority 


Allinvolved in animal research must ensure 
that rules for ethical experiments are observed. 


ore than a million people in Europe signed a petition earlier 
M this year to halt research with animals. One reason why 

Nature and many scientists are able to defend these experi- 
ments is that all involved do everything they can to minimize pain and 
suffering. Animal experiments are approved only after thorough discus- 
sion and are carried out according to strict regulatory controls. Society 
sees the benefits of animal research, but it does not seek them at any cost. 

When breaches of the strict rules that govern animal research occur, 
it is vital — to both supporters and opponents — that they are inves- 
tigated thoroughly, and that lessons are learnt and shared. This week, 
Nature publishes a correction on its website that details such a breach 
of experimental protocol in a previously published paper (L. Raj et al. 
Nature http://dx.doi.org/10.1038/nature 15370; 2015). 

The relevant experiments grew tumours in mice as a way to test 
possible treatments. This type of study is common, as is the way 
they are approved and regulated. Researchers typically plan the 
experiments and then submit details to an institutional review board 
for approval. In making its decision, the board follows guidelines 
set out by a separate body charged with oversight of animal pro- 
cedures — an institutional animal care and use committee. These 
guidelines are country-specific, and in the case of tumour experi- 
ments should include limits on the maximum tumour size allowed, 
and instructions to the researchers to monitor both tumour size and 
signs of distress. 

In this case, prompted by a complaint from a reader and follow- 
ing consultation with the authors and the relevant bodies, Nature 
has established that the scientists did not carry out the required 


monitoring properly. As a result, some of the tumours grew larger 
than permitted. These mice could therefore have experienced more 
pain and suffering than originally allowed for. 

As well as writing to correct their paper to mark the breach of animal- 
welfare guidelines, the authors apologize for the breach. They are right to 
do so. Cases suchas this could provoke a justifiable backlash against ani- 
mal research. All involved — scientists, institutions, funders and jour- 
nals — must do more to ensure that regulations are strictly observed. 

Nature’s policy is that the corresponding author on a paper that 
reports experiments with animals must confirm that the research was 
carried out in accordance with the relevant rules (see go.nature.com/ 
a9pjym). Asa result of this case, we are increasing the amount of infor- 
mation we request from authors. In experiments in which tumours are 
grown, we now require authors to include the maximal tumour size 
permitted by the institutional animal-use committee, and to state that 
this was not exceeded. Authors must also provide the source data for 
any figures that analyse tumour growth. 

Nature does not want to publish the results of experiments that have 
not been performed under ethical guidelines. As such, the authors in 
this case are correcting their paper to withdraw the portion of the data 
collected in experiments that the institutional committee concluded 
were in breach. The scientific conclusions of the paper remain valid and 
useful, and still stand. 

Institutions should do more to make sure that the guidelines they set 
are respected. At the very least, on completion of each project — and 
before data are submitted — institutions should verify that approved 
protocols were followed. Funders and institutions must consider 
better training for young researchers doing work with animals. And 
the broader community should continue to scrutinize and improve 
how it carries out these types of experiment. Discussions are already 
under way, for example, on whether the con- 
trol arms of similar cancer studies truly need 
to let (untreated) tumours grow as large as they 
currently do. Nature is happy to join these 
discussions and to help to improve practice. m 
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WORLD VIEW  jennisicos sen 


addresses the challenges and opportunities of an inherently 

interdisciplinary world. Policymakers and influential voices in 
science — including Nature — have also warned ofa worrying discon- 
nect between research and the needs and concerns of the public. One 
proposed solution is the integration of social scientists such as myself 
into publicly funded research initiatives. This is expected to contribute 
to the production of ‘better’ science. 

Not in my experience. I spent three years as an in-house social 
scientist at the Cornell NanoScale Science and Technology Facility in 
Ithaca, New York, and the US National Nanotechnology Infrastructure 
Network, and it was a futile and frustrating time. I left a decade ago, 
but friends and colleagues who have since worked on similar projects 
tell me that the problem is widespread and that 
little has changed. Too many in the physical and 
life sciences dismiss social sciences as having a 
‘service role, being allowed to observe what they 
do but not disturb it. 

In its current model, integration is fuelled by 
the assumption that projects bring in the social 
sciences to carve a place for ‘society: This is 
expected to maximize the benefits of research 
while reducing negative impacts and public con- 
troversy. In other words, rather than being sci- 
entists in our own right, we are brought along as 
silent partners whose job it is to care for science. 
Rather than blurring boundaries and labour divi- 
sions, integration works to reify them. Thus, the 
questions that social scientists ask and the exper- 
tise we can contribute are muted or made invisible 
because we remain outside ‘proper’ science. 

Integration is also deeply asymmetrical. The social sciences (often a 
single social scientist) are typically brought in after the project has taken 
shape. This asymmetry is present in every aspect ofintegration — from 
power to personnel numbers, funding, knowledge production and, 
ultimately, independence — but remains hidden in mundane inter- 
actions that dictate what counts as a valid social-science activity and 
who gets to define it. 

This is not genuine integration. It pays lip service to the idea and isa 
waste of everyone’ time and the public money that supports it. 

When I began my work alongside the nanotechnology scientists, I 
naively expected that my expertise as an ethnographer would be useful. 
I was prepared to study the culture of a laboratory and to probe its inter- 
action with wider society. I thought that this would be helpful, given the 
frequent statements made by nanotechnology 


hare and institutions increasingly prioritize research that 


experts about how they wanted to engage and DNATURE.COM 
talk about the risks and benefits of their work. Discuss this article 
Instead, the other scientists seemed to view my _ online at: 


role as one of managing a narrowlist of possible —_ go.nature.com/3u8ge9 


FORTHE SOCIAL 
SCIENCES TO MAKE 


MEANINGFUL 


CONTRIBUTIONS, 


FUNDING 


STRUCTURES MUST BE 


RETHOUGHT. 


Integration of social science 
into research is crucial 


Social scientists must be allowed a full, collaborative role if researchers are to 
understand and engage with issues that concern the public, says Ana Viseu. 


risks and consequences, so that ifa researcher followed my instructions 
and ticked boxes, then I would bless them as ‘social and ethical’ and 
they would be free to do their work with no concerns. I was routinely 
(wrongly) introduced as an ethicist and was expected to find minimal, 
non-disruptive ways of dealing with social and ethical issues. This was 
not a job that I could do nor wanted to do. Worse, my attempts to 
build bridges with my technical colleagues, for example by donning a 
cleanroom suit and learning how to use some of the equipment, were 
classified in lab annual reports as “outreach”. My perceived contribution 
was not one of expertise, but rather of a willingness to be educated in 
the proper way of thinking about nanotechnology. 

Although my experience has left me sceptical of integration, Iam not 
ready to dismiss the idea of fruitful collaboration between the natural 
and social sciences. Some fixes could be easily 
implemented: initiatives aiming for integration 
should have teams of social scientists, instead of 
one or two individuals, and these teams should be 
given the financial and operational autonomy to 
define and implement their activities. 

When integration is planned, there should be 
a reassessment of what social scientists call the 
‘positionality’ of the projects, which determines 
who pays for the research and thus who has the 
power to decide what is done, how it is done and 
what can be said about it. 

For the social sciences to make meaningful 
contributions, funding structures must also be 
rethought. Ideally, we would see increases in 
stand-alone funding for social-science strands 
without requirements for integration or subordi- 
nation to a topic. But this seems unlikely. There- 
fore, we must push for project funding structures that — from the 
start — allocate and ring-fence money for the social-science component. 

But this is not enough. For ‘integration to be productive, we must 
change its very meaning, from one of service to collaboration between 
equals. Doing so involves changes to scientific education and practice as 
wellas continued reframing of our definitions of success. We must insist 
on the value of complexity, so that divergent thinking is not eclipsed in 
the effort to speak with one voice. We must make room for the disputes 
that are at the centre of knowledge production. 

This is all the more important because, in a world of decreased 
funding for social sciences and humanities, speaking out of tune is 
both difficult and crucial. So we must begin to think of new means of 
partnership that will benefit us all. m 


Ana Viseu is associate professor at the Universidade Europeia in Lisbon, 
and a member of the Centro Interuniversitdrio de Historia das Ciéncias e 
Tecnologia, Universidade de Lisboa de Ciéncias, University of Lisbon. 
e-mail: ana@anaviseu.org 
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RESEARCH HIGHLIGHTS 


Climate sceptics 
use strong words 


Climate scientists use more 
cautious language in scientific 
reports than do climate- 
change sceptics, even though 
the sceptics often accuse the 
scientists of being alarmist. 

Srdan Medimorec and 
Gordon Pennycook at the 
University of Waterloo 
in Canada used software 
to analyse the style of 
language in a report by the 
Intergovernmental Panel 
on Climate Change (IPCC) 
in 2013 and in a response 
written by a sceptic group, 
the Nongovernmental 
International Panel on 
Climate Change (NIPCC). 
The researchers did not assess 
the scientific accuracy of the 
reports but found that the 
NIPCC report used emotional 
language and the IPCC report 
contained more neutral and 
formal phrasing. 

The authors hypothesize 
that the IPCC uses such 
language because of scrutiny 
from the media and sceptics. 
Clim. Change http://doi.org/7mb 
(2015) 


NUCLEAR PHYSICS 


Forensics reveals 
uranium’s past 


Uranium from German 
experiments during the Second 
World War was not used 

in a nuclear reactor for any 
appreciable amount of time. 


Selections from the 
scientific literature 


ANIMAL BEHAVIOUR 


Whales that click create cliques 


Sperm whales form clans by learning vocal calls 
from others that sing like them. This kind of 
‘cultural transmission has been seen as a mainly 
human trait. 

Sperm-whale clans use distinct dialects of 
clicks to communicate. To learn how their 
complex societies form, Mauricio Cantor 
at Dalhousie University in Halifax, Canada, 
and his colleagues used 18 years of data on 
the acoustic calls of sperm whales (Physeter 
macrocephalus; pictured) from around the 
Galapagos Islands to build several possible 


models of whale populations. In their 
simulations, the clans that have been observed 
in nature did not form when the vocal calls were 
genetically inherited or learned from other 
sperm whales in general. But clans did form 
when the animals adopted the most common 
calls produced by certain individuals — mainly 
those with similar communication patterns. 
This further suggests that humans are not 
the only mammals that segregate according to 
similarities in learned behaviour. 
Nature Commun. 6, 8091 (2015) 


Maria Wallenius at the 
European Commission Joint 
Research Centre's Institute 
for Transuranium Elements 
in Karlsruhe, Germany, and 
her colleagues did a forensic 
analysis of uranium samples 
(pictured) used in 1940s 
experiments in Germany. 
They looked for trace elements 
and isotopes of uranium and 
plutonium that are created 
when neutrons released during 
nuclear fission smash into 
other atoms. 

They traced the origin 
of the uranium toa mine 
in the Czech Republic, and 
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found that isotope ratios 
matched those found in 
natural uranium ore. The 
samples were never used in 
experiments that reached the 
critical mass necessary for 
sustained nuclear fission. 
Angew. Chem. Int. Ed. http://doi. 
org/f3f7js (2015) 


Pe CANCER 
Atrap for roving 
cancer cells 


Implanting a polymer scaffold 
in mice that have tumours 
captures spreading cancer cells, 


enabling their early detection. 
Lonnie Shea at the University 
of Michigan in Ann Arbor and 
his colleagues placed human 
breast-cancer cells in mice and 
implanted the scaffolds in their 
abdomens a week later. Two 
weeks after cell transplantation, 
the researchers detected cancer 
cells in the scaffolds but not 
in the lungs or liver, where 
breast cancer often spreads. 
After 28 days, mice with 
scaffolds had fewer tumours 
in their lungs than did animals 
without scaffolds. And using 
an imaging technique, the team 
measured changes in the tissue 


FLIP NICKLIN/MINDEN PICTURES/FLPA 


EUROPEAN COMMISSION 
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properties within the scaffold 
that indicated the presence of 
cancer cells. 

An inflammatory response 
to the scaffold attracted the 
cancer cells. This approach 
could eventually be used 
in humans to detect the 
early spread of cancer, the 
authors say. 

Nature Commun. 6, 8094 (2015) 


PLANETARY SCIENCE 


A faster spin for 
Mercury 


Mercury rotates nine seconds 
faster than scientists had 
thought, probably because 

of gravitational effects from 
Jupiter. 

A team led by Alexander 
Stark of the German Aerospace 
Center in Berlin studied three 
years of data from NASA's 
MESSENGER spacecraft, 
which orbited the planet 
between 2011 and 2015 and 
measured Mercury’s rotations 
more precisely than ever before. 

The data also confirm that 
the planet has a molten outer 
core, causing this part to rotate 
at a different speed from the 
solid inner layers. 

Geophys. Res. Lett. http://doi. 
org/7mc (2015) 


Muscle wasting 
blocked in mice 


Giving tumour-bearing mice 
specific proteins prevents a 
muscle-wasting syndrome that 
commonly affects people with 
cancer. 

Many patients with cancer 
die from severe muscle loss 
(cachexia), which has no 
treatment. To find a way to 
halt the condition, Amelia 
Johnston and Nicholas 
Hoogenraad at La Trobe 
University in Melbourne, 
Australia, and their colleagues 
injected mice with mouse 
cancer cells that had been 
engineered to express a 
human gene encoding the 
protein Fn14, which drives 
cancer growth. The animals 
lost muscle and fat, but giving 
the mice an antibody against 


Fn14 stopped cachexia. 
Moreover, in a mouse model 
of cachexia, the animals lived 
longer and maintained body 
weight when treated with an 
anti-Fn14 antibody, compared 
with untreated mice. 
Targeting Fn 14 proteins that 
are generated by tumours could 
be atreatment strategy for this 
condition, the authors say. 
Cell 162, 1365-1378 (2015) 


ASTRONOMY 


The farthest 
galaxy so far 


Astronomers have observed 
the most distant galaxy yet 
by detecting photons emitted 
from its clouds of hydrogen 
when the 13.8-billion-year- 
old Universe was less than 
600 million years old. 

Such photons rarely make 
it to telescopes on Earth, but 
Adi Zitrin at the California 
Institute of Technology in 
Pasadena and his colleagues 
were able to detect them using 
a telescope at the W. M. Keck 
Observatory in Mauna Kea, 
Hawaii. They found that 
the wavelength of arriving 
photons had been stretched 
en route, indicating that the 
galaxy, named EGSY8p7, is 
more than 13.2 billion light 
years (4 billion parsecs) away. 

Seeing hydrogen emission 
from such a distant galaxy 
may challenge current 
understanding of the 
evolution of the Universe, the 
authors say. 

Astrophys. J. Lett. 810, L12 (2015) 


ECOLOGY 


Marauding ants 
bring disease 


One of the most widespread 
invasive ant species not only 
displaces native ants, but also 
carries viruses. 

Phil Lester at Victoria 
University of Wellington 
and his colleagues searched 
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SOCIAL SELECTION 


Popular topics 
on social media 


Science failings shared on Twitter 


Researchers’ best success stories end up in journals, but 
many of their less-successful ones found their way on to 
Twitter this week with the hashtag #FailingInSTEM. Tales 
of low points and often-humorous mishaps reassured 
others that failures can be overcome on the way to scientific 
success. “The #FailingInSTEM tweets are so important! 
It's so comforting to know that other scientists make 
mistakes,’ tweeted Aimee Eckert, a PhD student in cell 
biology at the University of Sussex in Brighton, UK. Nicole 
Cabrera Salazar, an astronomy PhD student at Georgia State 
University in Atlanta, started the #FailingInSTEM Twitter 
discussion after a friend of hers suffered a scientific setback: 
“We need to let our young ppl know that regular, fallible 
people do science. We make mistakes everyday. It’s part of 
the job #FailingInSTEM.” She suspected that other young 
researchers could use a reminder that science is not all about 
successful experiments and flashy 


NATURE.COM 
For more on 

popular papers: 
go.nature.com/mzblhl 


for viral sequences in RNA 
extracted from Argentine ants 
(Linepithema humile; pictured) 
in New Zealand. They found 
a virus that they named 
Linepithema humile virus 1, 
which could explain periodic 
crashes in Argentine ant 
populations. They also found 
that the ants carried deformed 
wing virus, which can be fatal 
to honeybees. 

The team suggests that bees 
could become infected when 
the ants forage or raid bee nests. 
Biol. Lett. 11, 20150610 (2015) 


Weyl particles 
discovered 


Three separate teams have 
found analogues of Weyl 
fermions: massless elementary 
particles that were first 
predicted in 1929 but 


publications. “People don't talk about all 
of the times that they broke something 
ina lab or got heckled during a 
presentation,” she says. 


have never been observed. 

Physicists searching for 
these fermions look for their 
unusual properties in the 
collective behaviour of other 
particles. Hong Ding and Tian 
Qian at the Chinese Academy 
of Sciences in Beijing and 
their colleagues saw these 
‘quasiparticles’ by probing a 
sample of tantalum arsenide 
with a beam of X-rays. In 
July, a separate group of 
researchers led by Zahid Hasan 
at Princeton University in 
New Jersey announced that 
they had seen the particles in 
the same material. Ling Lu at 
the Massachusetts Institute 
of Technology in Cambridge 
and his colleagues reported 
seeing signs of the particles in 
the behaviour of light passing 
through a crystal. 

Such experimental systems 
could allow researchers to 
probe the exotic properties 
associated with Weyl particles. 
Phys. Rev. X 5,031013 (2015); 
Science 349, 613-617; 622-624 
(2015) 
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SEVEN DAYS nescnnss 


Clone-products ban 


The European Parliament has 
voted in favour ofa sweeping 
ban in the European Union 

on the use of food and feed 
products — domestic or 
imported — from cloned 
animals and their descendants. 
The rules tighten a draft 

law proposed in 2013 by 

the European Commission 

to prohibit the sale of food 
products derived from cloned 
animals. Parliamentarians 
said that the proposed 
amendments, voted in on 

8 September, reflect widespread 
consumer concerns over 
animal welfare and food safety. 
But Vytenis Andriukaitis, 

the European commissioner 
for health and food safety, 
called the amendments 
“disproportionate” and warned 
that they might prove “legally 
impossible”. 


Energy review 

The US Department of 
Energy released its second 
Quadrennial Technology 
Review on 10 September, 
which looks at current 
energy technologies and 
identifies opportunities for 
research and development. 
The report suggests that 

the US energy system is 
becoming increasingly diverse 
with the rise of renewable 
power, as well as being more 
interconnected through 


NUMBER CRUNCH 


879 


Total number of days Russian 
cosmonaut Gennady Padalka 
has spent in space, the most 
by any individual. Padalka 
returned from his latest stay 
on the International Space 
Station on 11 September. 


Pluto's ‘heart’ snapped in high resolution 


The left lobe of Pluto's bright heart-shaped 
feature, Tombaugh Regio, is clearly seen in 

the upper right of this image released on 

10 September. The view spans 1,800 kilometres 
and is generated from high-resolution images 


Internet and communications 
technologies; both trends open 
the door to cleaner energy and 
fewer emissions. Although 

the United States has made 
progress on energy efficiency, 
the report notes that substantial 
opportunities remain for 
reducing energy consumption 
and costs. 


Sonar muffled 

The US Navy has agreed to 
limit its sonar and explosives 
activities in areas that might 
harm dolphins, whales and 
other marine mammals. The 
agreement between the navy 
and environmental groups, 
including Earthjustice and the 
Natural Resources Defense 
Council, was ordered by a 
federal judge on 14 September. 
The navy will not be able to 
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use sonar, which can disrupt 
communication between 
marine mammals, in areas off 
the Southern California coast. 
Areas around Hawaii's islands 
are protected from sonar and 
explosives training operations. 


Energy ambitions 
The California legislative 
assembly passed laws on 

11 September that will increase 
requirements for renewable- 
energy production in the state. 
The bill, sought by Governor 
Jerry Brown, raises the current 
renewable-energy quota of 33% 
by 2020 to a more-rigorous 
50% by 2030. It also sets a goal 
of doubling energy efficiency in 
the electricity and natural-gas 
sectors by 2030. An earlier draft 
of the bill would have required 
the state to halve its oil use 


gathered during the 14 July fly-by of Pluto 

by NASA's New Horizons spacecraft. The 
boundary between the bright, icy plains (called 
Sputnik Planum) and dark, cratered terrains 
(called Cthulhu Regio) is particularly striking. 


over the same period, but the 
provision was dropped because 
of feasibility and cost concerns. 


Land undervalued 
Global land degradation 
costs between US$6.3 trillion 
and $10.6 trillion per year, 
according to the Economics 
of Land Degradation (ELD) 
Initiative in Bonn, Germany. 
Ina report published on 

15 September, the ELD 

said that unsustainable 

land management ruins 
productivity and removes 
ecosystem services that have 
no market value, including 
nutrient recycling and 
disease regulation. The loss 
figure includes the costs of 
replacing these services, for 


NASA/JHU APL/SRI 
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SOURCE: C. HANDKEY ET AL. SOC. SCI. RES. NETW. HTTP://DOILORG/7PX (2015). 


example by buying fertilizers 
or vaccines. People without 
funds to replace services 

such as clean water are 
particularly vulnerable to land 
degradation. 


Permafrost tracked 
The first international 
database of standardized 
permafrost data was launched 
this week by the Global 
Terrestrial Network for 
Permafrost (GTN-P), an 
international consortium 

that aims to establish an 
early-warning system for 
permafrost thawing, for use by 
scientists and policymakers. 
The European Union-funded 
database gathers frozen-soil 
temperatures and annual 
thaw depths. Permafrost has 

a key role in climate-change 
modelling, because when 

it thaws it can release the 
greenhouse gases carbon 
dioxide and methane. 


EVENTS 


Turnbull coup 
Malcolm Turnbull has ousted 
his fellow Liberal party 
member Tony Abbott as 
Australian prime minister, 
after forcing a party ballot 

on 14 September. Turnbull 
(pictured) won the leadership 
vote by 54 votes to 44. On 

his first day, Turnbull said 
that he would keep current 
climate-change policies for 
now. In 2014, Abbott repealed 


TREND WATCH 


Researchers in countries that 


often give academic exemptions 
from copyright laws, such as 
the United States, publish more 


data-mining studies than in places 


such as Germany and France, 
where academics must first gain 
consent. The work, presented at 
the 2 September European Policy 
for Intellectual Property meeting 
in Glasgow, UK, analysed 18,441 
articles on data mining, a method 
that allows investigators to trawl 
large data sets for discoveries 

(C. Handkey et al. Soc. Sci. Res. 
Netw. http://doi.org/7px; 2015). 


Australia’s carbon tax and 


scrapped an emissions- 
trading scheme, disappointing 
Australian climate scientists. 
In 2009, Turnbull was 
overthrown as party leader 

by Abbott, in part because 

of Turnbull's support for an 
emissions-trading scheme. 


Pesticide repealed 


A US appeals court has 
rescinded the approval 

by the Environmental 
Protection Agency (EPA) 

of an insecticide named 
sulfoxaflor. The approval 
was challenged by a group of 
bee-keeping organizations, 
which cited evidence that 
sulfoxaflor — a neonicotinoid 
compound with an unusual 
mechanism — is highly toxic 
to bees. The court ruled 

on 10 September that the 
EPAs decision was based 

on ‘flawed and limited data. 
Dow AgroSciences, which 
won approval for sulfoxaflor 
in 2013, is considering 
challenging the ruling. 


| _BUSINESS 
Phage trial starts 


A European company has 
started the first randomized 
clinical trial using bacterium- 
killing viruses called phages. 
On 9 September, Pherecydes 
Pharma, based in Romainville, 
France, announced that 

it had begun enrolling 

people with burns who are 
susceptible to infection by 

the bacteria Escherichia coli 
and Pseudomonas aeruginosa. 
The company will test two 
cocktails of phages that attack 
these species in trials involving 
220 people in Switzerland, 
France and Belgium. Phage 
therapy has been used for 
decades in Eastern Europe, 
but has not yet been tested ina 
large, controlled trial. 


| FUNDING 
Boost for Africa 


Researchers in African 

nations will share more than 
£46 million (US$70 million) 

in a programme to build 
science capacity. The first 
seven ‘Developing Excellence 
in Leadership, Training and 
Science (DELTA) awards were 
announced on 10 September. 
They range from mental-health 
research in Zimbabwe to 
science-leadership training in 
Kenya. Funded by the London- 
based biomedical charity the 
Wellcome Trust, together 

with the UK Department for 


COPYRIGHT IMPACTS ON DATA MINING 


In countries where copyright restrictions are relaxed for researchers 
mining large data sets, more data-mining articles are published. 
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SEVEN DAYS | THIS WEEK | 


17-18 SEPTEMBER 
Scientists from across 
academia and industry 
gather in Nairobi to 
discuss biotechnology 
and biomedical science 
in Africa. 
http://aibbc.org/ 


17-21 SEPTEMBER 
The 30th International 
Papillomavirus 
Conference convenes 
in Lisbon, offering 
workshops in clinical 
and public health. 
http://www.hpv2015.0rg/ 


21-25 SEPTEMBER 
Sandbjerg, Denmark, 
hosts the 3rd 
International Workshop 
on Microbial Life 

under Extreme Energy 
Limitation. 


http://microenergy2015.org/ 


International Development 
and the Bill & Melinda Gates 
Foundation, the DELTAs 
programme will from 2016 be 
managed by the newly formed 
Alliance for Accelerating 
Excellence in Science in Africa 
(see Nature 520, 142-143; 
2015). 


New sponsor sought 
Intel will drop its support for 
the oldest and most prestigious 
science contest for US high- 
school students, Science Talent 
Search, after the 2017 event. 
Eight Nobel prizewinners, 

five winners of the National 
Medal of Science and many 
other scientists are alumni of 
the 73-year-old programme, in 
which students are judged on 
original research. The Society 
for Science & the Public in 
Washington DC, which runs 
the competition, announced 
on 9 September that it was 
looking for a new sponsor after 
Intel pulled its US$1.6-million 
annual backing, begun in 1998. 


> NATURE.COM 
For daily news updates see: 
Www.nature.com/news 
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NIH conflict-of- LIGO resumes Security 4 
interest rules get poor hunt for gravitational cameras capture 86 4 
reviews p.300 waves p.301 new meteor showers p.302 


The most 
~ interdisciplinary fields and 
countries are revealed p.306 


WITS UNIVERSITY 


Lee Berger (front) recruited a team of wiry excavators to retrieve more than 1,500 fossils from the Dinaledi chamber in South Africa. 


PALAEOANTHROPOLOGY | 


Crowdsourcing digs up an 
early human species 


Palaeoanthropologist asks excavators and anatomists to study Africa’s richest fossil trove. 


BY EWEN CALLAWAY 


CC ear colleagues — I need the help 
D of the whole community,” palaeo- 
anthropologist Lee Berger posted 

on social media on 6 October 2013. 

Berger, based at the University of 
Witwatersrand in Johannesburg, South Africa, 
had just learnt of a small underground cham- 
ber loaded with early human fossils. He was 


looking for experienced excavators to collect 
the delicate remains before they deteriorated 
further. “The catch is this,” Berger went on. 
“The person must be skinny and preferably 
small. They must not be claustrophobic, they 
must be fit, they should have some caving 
experience.” 

Less than two years after he posted this 
missive, Berger and his team have pieced 
together more than 1,500 ancient human bones 


and teeth from the Rising Star cave system — 
the biggest cache of such material ever found in 
Africa. The remains belong to at least 15 indi- 
viduals of a previously undescribed species that 
the team has dubbed Homo naledi, and they 
may mark the oldest-known deliberate burial in 
human history, Berger and his colleagues report 
in eLife(L. R. Berger et al. eLife 4, 09560; 2015 
and P. H.G. M. Dirks et al. eLife 4, 09561; 2015). 
For Berger, the research marks a milestone in 
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> acampaign to transform palaeoanthropol- 
ogy into an open and inclusive field, in which 
rare fossils are rapidly shared with the scientific 
world instead of being squirrelled away as an 
elite few scrutinize them for years. 

“There's lots of fossils out there no one 
has ever seen, except for a few select people. 
Palaeoanthropology is really rotten that way,” 
says Tracy Kivell, a palaeoanthropologist at the 
University of Kent in Canterbury, UK, who 
analysed hand bones from Rising Star and is a 
co-author of the paper that describes H. naledi. 
“Lee is changing that and setting a new stand- 
ard for what we should expect” 

Palaeoanthropologist Denné Reed of the 
University of Texas at Austin sees Berger’s 
openness as part ofa generational shift in the 
field. “We're more interested in openly sharing 
data,” he says. “The advantages in collabora- 
tion far outweigh any of the risks.” 

A few weeks before Berger advertised for 
help, cavers who work with him had discov- 
ered the Dinaledi Chamber in the Rising Star 
cave system, about 50 kilometres northwest 
of Johannesburg. Berger hoped to remove the 
remains as soon as possible, but he needed 
help. The narrow chamber is about 30 metres 
below ground, and the only access is through a 
slit in the rock some 20 centimetres wide (see 
“Tough commute’). “I’m not physiologically 
appropriate to ever get in the system,” he says, 
referring to his large build. A month after his 
social-media post, Berger had six scientists at 
work in the cave. 


TREASURE TROVE 

As he and other colleagues watched a video 
feed in a nearby tent, the excavators pulled up 
skulls, femurs, teeth and hundreds of other 
specimens. “By the end of this expedition, we 
had recovered more individual remains than 
had been discovered in South Africa in the last 
90 years,’ says Berger. 

The research team with which Berger usu- 
ally works to analyse early human remains was 
busy making sense of other fossils, discovered 
in 2008 at the nearby Malapa site. So Berger 
put out another social-media call, this time 
recruiting more than 30 early-career scien- 
tists to attend a month-long workshop to 
analyse and describe the fossils. 

John Hawks, a palaeoanthropologist 
at the University of Wisconsin—Madison 
who helped to coordinate the Rising Star 
dig and workshop, says that the team took 
flak for its unorthodox approach. “There's a 
lot of the field that really believed we're just a 
couple of cowboys who don’t know how things 
should be done,” he says. 

The team intends to publish at least a dozen 
papers from the workshop in the coming 
months; the two published today are the first. 
They describe the discovery site and the anat- 
omy of H. naledi, the skull of which encased a 
small, fist-sized brain much like those of other 
early members of the genus Homo and of the 


TOUGH COMMUTE 


To access the fossil-rich Dinaledi Chamber, six slender 
scientists had to penetrate deep into the Rising Star cave 
system, squeezing themselves through a passage dubbed 
Superman Crawl because cavers have to extend one arm 
overhead — like the flying superhero — to get through. 


12-m 
vertical shaft 


Dinaledi 


Chamber Dragon’s 


Back 


more ancient australopiths. In other ways, its 
body is more like those of modern humans, 
with the lower limbs and feet of a biped and 
hands that could have gripped tools with pre- 
cision. The researchers estimate that H. naledi 
would have stood just under 1.5 metres tall and 
weighed between 40 and 55 kilograms. 

“Tt is a very strange combination of features, 
some that we've never seen before and some 
that we would have never expected to find 
together,’ says Hawks. 


FAMILY RESEMBLANCE 

The researchers are unclear about how H. nal- 
edi is related to other early human species 
that lived in Africa, such as Homo erectus 
and Homo habilis. They hope to date calcite 


Homo naledi’s skull looks like an australopithecine’s. 
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Johannesburg 


South Africa 


1,480 m 
— above sea level 


deposits in the cave to establish the age of the 
remains, which could be more than 1 million 
years old. 

The chamber contains no evidence that early 
humans lived there, and no bones from species 
other than H. naledi, so Berger believes that it 
might be a deliberate burial, and possibly the 
oldest known. Currently, the oldest site that 
seems to represent an early human burial is 
Sima de los Huesos in the Atapuerca Moun- 
tains of Spain, which dates to 430,000 years ago. 

Fred Spoor, a palaeontologist at University 
College London, agrees that the bones rep- 
resent a previously unknown Homo species, 
and says that Berger's team makes a good case, 
after considering other alternatives, that the 
remains were deposited intentionally. He is 
eager to see what other experts make of it. 

However, Jeffrey Schwartz, an evolution- 
ary biologist at the University of Pittsburgh in 
Pennsylvania, thinks that the material is too 
varied to represent a single species. “I could 
show those images to my students and they 
would say that they're not the same,” he says. 
One of the skulls looks more like it comes from 
an australopithecine, he adds, as do certain 
features of the femurs. 

Schwartz and others will soon get the 
chance to judge the Rising Star remains for 

themselves. Berger’s team has uploaded data 

including 3D scans of the remains to the 

MorphoSource repository, and welcomes 
other researchers to study the material at 
first hand. Berger did the same with remains 
ofa species called Australopithecus sediba that 
were discovered at the Malapa site. 

Schwartz says that he has had trouble 
accessing some researchers hominin remains 
even after they had been described in a jour- 
nal. But when he asked Berger’s team if he 
could purchase A. sediba casts several years 
ago, he got them for free. “How good can you 
be?” says Schwartz. “It’s been refreshing and 
delightful that Lee Berger has always made his 
specimens accessible.” = 


SOURCE: P. H. G. M. DIRKS ETAL. ELIFE 4, 09561 (2015) 
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Africa braced for snakebite crisis 


Health specialists warn that stocks of antivenom willrun out in 2016. 


BY QUIRIN SCHIERMEIER 


ural Africa is facing a resurgence of 
R a persistent plague that rarely makes 
headlines: snakebite. 

By June next year, stockpiles of the anti- 
venom that is most effective against Africa's 
vipers, mambas and cobras are expected to run 
out because the only company that makes the 
medicine has stopped production. With no 
adequate replacement in sight, the death toll 
from bites is set to rise, specialists warned at a 
tropical-medicine congress last week in Basel, 
Switzerland. 

“We're dealing with a neglected health crisis 
that is turning into a tragedy for Africa,” says 
Gabriel Alcoba, a medical adviser with the 
international humanitarian group Médecins 
Sans Frontiéres (MSF; also known as Doctors 
Without Borders). 

Poisonous snakes might seem an archaic 
menace in such a rapidly urbanizing world. 
Yet by cautious estimates, snakebites kill more 
than 100,000 people worldwide every year 
(see ‘Death toll’) — more, on average, than 
lose their lives in natural disasters. And survi- 
vors often experience permanent physical and 
mental disabilities. 

In 2010, the French drug firm Sanofi Pasteur 
in Lyon ceased production of Fav- Afrique, an 
antibody serum that reduces the quantity of 
venom circulating in the blood ofa snakebite 
victim. Made from the purified plasma of horses 
previously injected with small quantities of 


~ 


The deadly carpet viper (Echis ocellatus). 


snake venom, the serum neutralizes the venom 
of many of Africa's most dangerous snakes. 
The antidote has saved many people from 
bites by deadly species such as the carpet viper 
(Echis ocellatus), common in West Africa, 
and the black mamba (Dendroaspis polylepis), 
found across the sub-Saharan region. But the 
high costs — US$250-500 per person — and a 
supply shortage mean that only about 10% of 
snakebite victims in Africa get treatment, and 
the company says that producing the antidote is 
no longer profitable. Cheaper products by com- 
petitors have forced Sanofi Pasteur out of the 
African market, says Alain Bernal, a company 


DEATH TOLL 


Fuzzy estimates of snakebite fatalities 


It is uncertain how many people are bitten or 
die from snakebites in sub-Saharan Africa. 
But according to Médecins Sans Frontiéres 
(MSF; also known as Doctors Without 
Borders), whose health-care workers treat 
snakebites through field programmes in the 
Central African Republic and South Sudan, an 
estimated 30,000 people die each year and at 
least 8,000 more undergo amputations. 

But snakebite mortality could be much 
higher than anecdotal reports suggest. For 
some countries, including the Democratic 
Republic of Congo — home to an enormous 
number of venomous snakes — there are no 
reliable data, says tropical-medicine specialist 
David Warrell at the University of Oxford, UK. 


Under-reporting is not limited to Africa. 
The authors of a nationally representative 
snakebite-mortality survey, published in 
2011, deduced that, despite the availability 
of antidotes, around 46,000 people in India 
die of snakebites every year (B. Mohapatra 
et al. PLoS Negl. Trop. Dis. 5,e1018; 2011). 
India’s Central Bureau of Health Intelligence 
reported merely 1,219 and 985 fatal bites 
for 2009 and 2010, respectively. One 
reason for the discrepancy, says Warrell, 
who co-authored the study, is that many 
victims of snakebites die before they reach 
a hospital, or waste precious time with 
traditional healers before seeking more- 
conventional medical help. 0.8. 


spokesman. Sanofi Pasteur is working to enable 
the transfer of know-how to companies willing 
to take over production of Fav-Afrique, he says. 

Pharmaceutical companies in South Africa, 
India, Mexico and Costa Rica are among those 
marketing cheaper products — some of which 
work well against snakes in their host nations. 
But their safety and effectiveness against the 
large variety of species in Africa have not yet 
been established in clinical trials. To speed up 
the process, MSF is offering two of its hospi- 
tals in the Central African Republic (CAR) 
and South Sudan as study sites. But it will take 
at least two years to validate the products in 
development, and none is as broadly efficient 
as Fav-Afrique, Alcoba says. 


NEGLECTED THREAT 

Although just now becoming critical, Africa's 
snakebite problem has been smouldering for 
years, says tropical-medicine specialist David 
Warrell of the University of Oxford, UK, who 
consults for the World Health Organization 
(WHO). Snakebite fatalities have been rising 
over the past decade in the CAR, Ghana and 
Chad — in part owing to a failure to train 
enough medical staff, ignorance from health 
ministries and “unscrupulous marketing” of 
inappropriate antivenoms, he says. “War-torn 
countries have many other problems. But 
the millions of children, poor farmers and 
nomadic people at risk of snakebites just don't 
have the ear of politicians in capital cities” 

And according to Warrell, the WHO has 
done little to help. To improve the safety and 
efficacy of antibodies, the agency has released 
guidelines for producing antivenoms. But 
it has no formal programme for improving 
treatment by training medical workers, advis- 
ing ministries or educating communities, as it 
does for 17 other neglected tropical diseases, 
including dengue and sleeping sickness. And 
yet, says Warrell, snakebites cause more deaths 
than do all 17 diseases put together. 

Warrell says that, while waiting for clinical 
trials to bring replacements for Fav- Afrique 
to the market, the keys to reducing the risk of 
snakebite are education and preventive meas- 
ures — such as wearing proper shoes, using 
a light when walking home from the fields 
and sleeping above ground level, beneath a 
mosquito net. 

Thankfully, says Alcoba, the global-health 
community is starting to grasp the urgency of 
the situation. “People used to laugh when we 
talked about snakebites,” he says. “They don't 
laugh any more.” = 
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| NEWS IN FOCUS 


NIH disclosure rules falter 


Regulations that require researchers to disclose conflicts of interest yield questionable data 


and cost universities millions. 


BY SARA REARDON 


hen a US Senate investigation in 

2008 revealed that psychiatrist 

Charles Nemeroff of Emory Uni- 
versity in Atlanta, Georgia, had not disclosed 
at least US$1.2 million in income from drug 
companies, Senator Charles Grassley decided 
to do something about it. The Iowa Republi- 
can led a charge to push the National Institutes 
of Health (NIH), which funded Nemeroff’s 
research, to change how it evaluates research- 
ers who accept money from industry. 

The resulting reforms, which took effect 
in 2012, require scientists to report industry 
connections in greater detail than before, and 
charge institutions with determining which 
ties are problematic. But three years later, 
it is not clear what the costly, cumbersome 
rules have accomplished. A Nature analysis 
suggests that institutions have vastly differ- 
ent standards for what constitutes a conflict 
— and that they classify relatively few rela- 
tionships between researchers and industry 
as troublesome. 

“There’s a lot more financial conflict of inter- 
est in my view than the NIH is getting from the 
reports of universities,’ says Sheldon Krimsky, 
who studies conflict-of-interest issues at Tufts 
University in Medford, Massachusetts. “We're 
just seeing the tip of the iceberg” 

The reforms, enacted by the NIH’s parent 
agency, the Department of Health and Human 
Services (HHS), do seem to have increased 
the number of financial relationships that 
researchers report to their universities — by 
45% overall, according to data from 56 uni- 
versities in a survey released in April by the 
Association of American Medical Colleges 
(AAMC) in Washington DC (see go.nature. 
com/hc5r2b). But the number of conflicts that 
institutions reported to the NIH has increased 
only slightly, according to NIH data obtained 
by Nature through a freedom-of-information 
request (see “Under the microscope’). 

The agency’s original conflict-of-interest 
regulations, implemented in 1995, required 
institutions to report when an HHS-funded 
researcher received more than $10,000 from an 
outside source. The revised rule lowered that 
threshold to $5,000 and directed researchers to 
disclose a wider variety of potential conflicts, 
such as sponsored travel and relationships with 
non-profit organizations. 

Institutions, which receive conflict-of-interest 


UNDER THE MICROSCOPE 


Through a freedom-of-information request, Nature obtained conflict-of-interest reports submitted to 
the US National Institutes of Health (NIH). For more on our methodology, see go.nature.com/11pjj6 


OUTLOOK HAZY 


Data from the NIH, which cover the period from August 2012 to May 2015, suggest that the number of 
conflicts of interest that an institution reports does not always reflect the number of grants that it receives. 


e University of California, 
(fy San Francisco e 
z 
G 3. ak ®. 
a e 
= ° °° ee 
a & > Stanford University 
EQ rman 
Ey e 
ie e : 
iS) ® 
a ofgoe 
Q 1... 90.8 “eer e 
5 a é 


(0) 20 40 60 


® Pennsylvania State University Hershey Medical Center 


st wayene Hopkins University sects otestca era oet 


Yale 
Massachusetts © University 
General Hospital 


e 
University of 
Wisconsin—Madison 


University of Texas, Austin 


80 100 120 140 


Number of financial conflict-of-interest disclosures 


SMALL CLAIMS 


Institutions reported 2,523 financial conflicts of interest between January 2013 


and May 2015. Most were relatively small. 
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31 claims involved 

$1 million or more. The 
largest confirmed by 
Nature was $13 million. 


Cost of claim (US$, thousands) 
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HHS reforms lowered the 
reporting threshold from 
$10,000 to $5,000. 


Determining the financial 
value of some relationships, 
such as a stake in a start-up 
company, can be difficult. 


600 800 1,000 


Number of financial conflict-of-interest disclosures 


reports from their researchers annually, must 
then convene an internal panel to determine 
whether a particular relationship could affect 
a researcher's work. If so, the panel designs 
a ‘management plan’ that may require the 
researcher to disclose the conflict in publica- 
tions or, in some cases involving human sub- 
jects, to stand down as the study’s primary 
investigator. Institutions then send these plans 
to the NIH. 

Universities have spent millions of dollars 
and hired extra staff to comply with these 
reforms, and most administrators are furious 
about the burden. “We already had an annual 
disclosure process for all the faculty,” says 


300 | NATURE | VOL 525 | 17 SEPTEMBER 2015 


© 2015 Macmillan Publishers Limited. All rights reserved 


Andrew Rudczynski, associate vice-president 
for research administration at Yale University 
in New Haven, Connecticut. “I can't see a 
single benefit to it?” 

Yale spent $500,000 to implement the 
revised NIH rules. In the year after they took 
effect, the number of disclosures by the univer- 
sity’s researchers doubled — but Yale identified 
just one new conflict, Rudczynski adds. Other 
universities report similar experiences. 

And whereas the HHS had estimated that the 
roughly 2,000 institutions that it funds would 
spend $23.2 million a year to comply with 
the regulations, the AAMC survey suggests 
that the true cost has been much higher. Just 


SOURCE: NIH 


71 institutions spent a total of $23 million in 
the year after the reforms took effect, although 
their costs going forward may be lower. 

Paul Thacker, who led the 2008 Senate inves- 
tigation as a member of Grassley’s staff, admits 
that it is difficult to know how well the reforms 
are working. That is largely because the poten- 
tial benefits of greater disclosure of financial 
ties, such as peer reviewers giving closer scru- 
tiny to studies by researchers with conflicts, are 
tough to measure. 

Still, Thacker says, there is a clear need 
for closer scrutiny. This is backed up by evi- 
dence showing that studies funded by pri- 
vate sources, such as drug firms, more often 
produce results that benefit the funder than 
do publicly funded studies (A. Lundh et al. 
Cochrane Database Syst. Rev. 12, MR000033; 
2012). And Thacker has little sympathy for 
universities’ complaints. “It just shows that 
they still don’t get what the problem is,” he 
says. “They’re in this place today because 
they've failed to create confidence for the 
public in the past.” 

Others worry that the HHS policy is still not 
strict enough. Krimsky says that the current 
rules may give institutions too much power to 
assess conflicts, without accounting for ways 
that universities themselves can be compro- 
mised by ties to government or industry. This 
could be one reason why the HHS reforms 
did not significantly increase the number of 
reported conflicts, Krimsky adds. 

Those pushing for greater transparency 
are also frustrated that the NIH does not 
require institutions to publish information 
about researchers’ conflicts and management 


plans online. Instead, members of the public 
must ask a university for information on a 
researcher’s conflicts; the institution has five 
days to disclose dollar amounts and sources. 
Nonetheless, the NIH Office of Extramural 
Research says that about 50% of institutions 
that submit conflict-of-interest reports have 
voluntarily created online databases, although 
these vary in usability and completeness. 
Requesting such information from universi- 
ties directly also produces mixed results. Nature 
contacted 20 public and private institutions 
that had reported individual researchers with 
conflicts of interest involving more than $1 mil- 
lion, seeking details 
on these relation- 


(t9 e 
Pc hepehe ships. The majority 

’ y of these institutions 
of what’s 


responded immedi- 
ately, but some took 
as long as two weeks 
to respond, directed Nature’s reporter to the 
media office, or instructed her to submit a 
freedom-of-information request. Most 
declined to share information about conflicts 
that occurred before the current calendar year, 
which is not required by the HHS. 

Nor does the department require the release 
of management plans, which troubles Tobin 
Smith, vice-president for policy at the Asso- 
ciation of American Universities in Washing- 
ton DC. “If you disclose that there is a conflict 
but don't disclose how the university is man- 
aging it — which is not part of the regulations 
— the public doesn’t understand the relation- 
ship,” he says. 

The NIH also struggles to defend its own 


disclosed to us.” 
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regulations. “One could debate whether or 
not we needed to promulgate a new rule,” 
says Sally Rockey, director of the NIH Office 
of Extramural Research. “At the time, there 
was a lot of scrutiny in the press and Congress 
got involved.” She concedes that the reforms 
were mostly in response to this outside pres- 
sure. (Grassley declined to comment on the 
regulations.) 

And it is unclear whether the revised regu- 
lations would have identified Nemeroff, who 
did not tell Emory about his industry rela- 
tionships. “Science and research are built on 
trust, and we are still at the mercy of what’s 
disclosed to us,’ says Eric Mah, senior direc- 
tor of research compliance at the University of 
California, San Francisco. 

The NIH plans to review the conflict-of- 
interest reforms later this year, to develop 
best practices for compliance. The agency 
will examine data on the type and number of 
reported conflicts, as well as institutions’ expe- 
riences of complying with the requirements. 
But Rockey says that the HHS is unlikely to 
make significant changes to the rules, given 
that they took four years to develop. 

In the meantime, research institutions are 
caught in a bind. The 1980 law that allows US 
universities to patent inventions encourages 
relationships with industry, and tight federal 
research budgets are driving more scientists 
to seek support from private funders. “There 
are no easy answers,’ Thacker says. “Univer- 
sities are being pushed into greater reliance 
on industry funding and until that reverses, 
these problems just become more and more 
complicated.” = 


Hunt for cosmic waves to resume 


Upgraded LIGO detectors willimprove chances of finding ripples in space-time. 


BY DAVIDE CASTELVECCHI 


Imost 100 years after Einstein presented 
At: general theory of relativity in a 

Berlin lecture theatre, the quest to spot 
the gravitational waves he predicted may be 
entering its final stages. 

This week, the world’s largest gravitational- 
wave facility is expected to start collecting data 
again after a 5-year US$200-million overhaul. 
The Laser Interferometer Gravitational-Wave 
Observatory (LIGO) searched fruitlessly for 
these cosmic ripples for almost a decade in 
the 2000s. But the odds that its improved ver- 
sion — known as Advanced LIGO — will detect 
any waves in the next three months may be as 
high as one in three, according to some of the 
physicists involved in the experiments. 


Initial tests have shown that the observatory’s 
twin detectors, in Washington state and Louisi- 
ana, are performing as expected, says Gabriela 
Gonzalez, spokesperson for the 900-strong 
LIGO Scientific Collaboration. And that is 
no mean feat for an instrument that has cost 
$620 million so far. “It’s the first time that any- 
thing in this field is on budget and on sched- 
ule,” says Karsten Danzmann, director of the 
Max Planck Institute for Gravitational Physics, 
in Hannover, Germany, who is not part of the 
LIGO management team. 

According to general relativity, gravitation 
originates from the interplay between massive 
objects and the malleable fabric of space-time. 
Einstein predicted that accelerating masses such 
as colliding neutron stars or black holes would 
disturb that fabric and produce gravitational 


ripples that propagate through the Universe. 

Each of LIGO’s detectors is designed to meas- 
ure the deformation of space-time by compar- 
ing changes in the paths of laser beams that race 
down its two perpendicular 4-kilometre-long 
arms, bounce between mirrors and interfere 
with each other back at their source. When a 
gravitational wave passes through, it slightly 
alters the lengths of the arms, and the obser- 
vatory can spot such changes with a sensitiv- 
ity of one part in 10”. That is comparable to a 
hair’s-width change in the distance from the Sun 
to Alpha Centauri, its nearest star, says Laura 
Cadonati, a physicist at the Georgia Institute of 
Technology in Atlanta who will be coordinating 
the experiment's data analysis. 

A crucial part of the improvement is better 
damping of the vibrations caused by > 
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> less-than-heavenly sources. The problem 
was especially acute at the site in Livingston, 
Louisiana, which is in the middle of a timber 
plantation. Any felling of trees would disturb 
the detector, so it could keep its laser beams ‘in 
lock — vibrating at precise frequencies — only 
at night or on weekends. A passing train would 
knock the site out for an hour, says physicist 
Brian O'Reilly, who will coordinate the follow- 
up of detections at the Livingston site. But now, 
he says, the detector should be able to take data 
over several days at a time without interruption. 

Advanced LIGO is already three times 
more sensitive than its predecessor, but in 
three months’ time it will shut down for more 
improvements that will make it ten times more 
sensitive. When it reopens around 9 months 
later, it should be able to spot cosmic ripples 
from cataclysmic events — such as the colli- 
sions of black holes — up to 120 megaparsecs 
(326 million light years) away on a regular 
basis and sample a volume of space 1,000 times 
greater than the original observatory. 

Next year, LIGO will be joined by a slightly 
smaller €200-million (US$226-million) Franco- 
Italian detector near Pisa, Italy, called Advanced 
Virgo, which is undergoing its own upgrade. 
The LIGO and Virgo teams will pool their data 
to check each other’s detections. They expect 
to see waves from mergers of binary neutron 
stars — events that should generate strong, pre- 
dictable signals — but do not know precisely 
how many to anticipate. “It could be, depending 
on the models, ten binary neutron star detec- 
tions a year or so,’ Gonzalez says. “But it could 
be 10 times higher or 100 times lower” 

“The first detections will be quite dramatic 
for us.’ says Rainer Weiss, a theoretical physi- 
cist at the Massachusetts Institute of Technology 
in Cambridge who was one of LIGO’s found- 
ers. “The first thing we will need to sort out is 
whether we truly believe what we are seeing” 

Having detectors on different continents is 
crucial for providing a rough estimate of the 
origin of the waves, says Fulvio Ricci, a physi- 
cist at the Sapienza University of Rome and the 
spokesperson for Virgo. Once they know that, 
astronomers will be able to look for other signs 
of that event using electromagnetic radiation, 
such as X-rays or visible light. 

Einstein published his first papers on gravita- 
tional waves in 1916. Detecting these ripples a 
century later, Weiss says, would be of “enormous 
symbolic importance”. = 
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BY ALEXANDRA WITZE 


he list of meteor showers that occur 

| every year has just grown longer. 
Eighty-six previously unknown 
showers have now joined the regular spec- 
taculars, which include the Perseids, Leo- 
nids and Geminids. Astronomers spotted 
the shooting-star shows using a network 
of video cameras designed to watch for 


| MORE NEWS | 
Why marine @ California snowpack lowest in past 
life needs 500 years go.nature.com/c2gul3 
protection @ Scientists trial humane shark 
from noise deterrents go.nature.com/uqlhll 
pollution @ Southern Ocean sucks up more 
go.nature.com/ carbon dioxide than was thought 
qdldtz go.nature.com/wfh4vh 
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A meteor (upper left) streaks through the Orion constellation during the Perseid shower. 


ASTRONOMY 


Dates added to 
meteor calendar 


Skywatching cameras spot 86 previously unknown events. 


burglars, but repurposed to spy cosmic debris 
burning up in Earth’s atmosphere. 

The newfound showers are faint but impor- 
tant: each is fuelled by Earth’s passage through 
a trail of particles left behind by a comet or 
asteroid, so mapping them reveals previously 
unknown sources of dust. 

“The cool thing is, we are not just doing 
surveillance of meteors in the night sky,” 
says Peter Jenniskens, an astronomer at the 
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SETI Institute in Mountain View, California. 
“Now we also have a three-dimensional pic- 
ture of how dust is distributed in the Solar 
System.” 

Most of the particles are the size of a sand 
grain, but a few are large enough to survive 
the searing heat of their passage through 
the atmosphere — and possibly do damage 
on Earth’s surface. Jenniskens and his col- 
leagues describe the discoveries in four papers 
accepted for publication in Icarus. 

Astronomers have been documenting 
meteors for centuries, first by eye and more 
recently with radar and video-tracking systems. 
Meteors sprinkle Earth steadily throughout the 
year, but during a shower a significant num- 
ber seem to originate from the same point in 
the sky. Skywatchers around the world have 
reported more than 750 possible meteor show- 
ers to the International Astronomical Union 
(LAU) — but only a small fraction of those have 
been confirmed as bona fide events. 


SKY SURVEILLANCE 

Jenniskens’ team set up cameras at three loca- 
tions in northern California to confirm or 
rule out these rumoured showers. The Cam- 
eras for Allsky Meteor Surveillance (CAMS) 
project points 60 security cameras in different 
directions to capture as many shooting stars as 
possible. Each has a relatively narrow field of 


view, but together they cover a broad dome of 
sky centred directly overhead and extending 
down to 30° above the horizon. 

“CAMS is about getting massive data sets 
on meteors, so you can see through all the 
scatter to get at those new showers,’ says 
Phil Bland, a planetary scientist at Curtin 
University in Perth, Australia. He helps to 
run a tracking network in the Australian 

outback that looks 


“The more for extremely bright 
we sample meteors in an effort 
the sky, the to recover meteorites 
more detailed ge aioe 

our picture ince it began in 
son . 2010, CAMS has 


measured more than 

250,000 meteors. 
Of those, about three-quarters were random 
singletons and one-quarter came in showers. 
CAMS has confirmed 81 showers that were on 
the IAU’s questionable list, and discovered 86 
new ones. 

Among these is one that lights up South- 
ern Hemisphere skies in early December, and 
seems to radiate from the constellation Vela. 
It is surprisingly strong for a shower that had 
not been noticed before, says Jenniskens. 
During the March 2013 peak of a newly con- 
firmed shower, skywatchers saw the bright 
flash of a rock-sized object hitting the Moon. 
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The CAMS team has been expanding its 
search by setting up smaller camera networks 
in the Netherlands and New Zealand. “The 
more we sample the sky,’ says Jenniskens, “the 
more detailed our picture becomes of what is 
coming in. = 


CORRECTIONS 

The News story ‘Encryption faces quantum 
foe’ (Nature 525, 167-168; 2015) incorrectly 
named the location for the cryptography 
workshop that began on 6 September. The 
workshop was held at the Schloss Dagstuhl- 
Leibniz Centre for Informatics in Wadern, 

not the Leibniz Center for Informatics in 
Oktavie-Allee. 

The News Feature ‘Fishing for the first 
Americans’ (Nature 525, 176-178; 2015) 
incorrectly credited the photo taken at 
Cooper’s Ferry. Credit should have gone to 
Hayden Wilcox, not Joanne McSporran. 

The News story ‘Health study set to decide 
data policy’ (Nature 525, 16-17; 2015) 
incorrectly stated that an NIH working group 
planned to create a blanket data-sharing 
policy for the Precision Medicine Initiative. 

It is in fact developing a policy that can 
accommodate participants’ varying interest 
in seeing their own genetic information. 
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B SPECIAL 


ANISCIPLIARITY 


Scientists must work together to save the world. A special 
issue asks how they can scale disciplinary walls. 


ety — energy, water, climate, food, health 

— scientists and social scientists must 
work together. But research that transcends 
conventional academic boundaries is harder 
to fund, do, review and publish — and those 
who attempt it struggle for recognition and 
advancement (see World View, page 291). 
This special issue examines what governments, 
funders, journals, universities and academics 
must do to make interdisciplinary work a joy 
rather than a curse. 

A News Feature on page 308 asks where the 
modern trend for interdisciplinary research 
came from — and finds answers in the prolif- 
eration of disciplines in the twentieth century, 
followed by increasingly urgent calls to bridge 
them. An analysis of publishing data explores 
which fields and countries are embracing inter- 
disciplinary research the most, and what impact 


T o solve the grand challenges facing soci- 


such research has (page 306). On page 313, Rick 
Rylance, head of Research Councils UK and 
himself a researcher with one foot in literature 
and one in neuroscience, explains why interdis- 
ciplinarity will be the focus of a 2015-16 report 
from the Global Research Council. Around the 
world, government funding agencies want to 
know what it is, whether they should they invest 
in it, whether they are doing so effectively and, 
if not, what must change. 

How can scientists successfully pursue 
research outside their comfort zone? Some 
answers come from Rebekah Brown, director 
of Monash University’s Water for Liveability 
centre in Melbourne, Australia, and her col- 
leagues. They set out five principles for suc- 
cessful interdisciplinary working that they have 
distilled from years of encouraging researchers 
of many stripes to seek sustainability solutions 
(page 315). Similar ideas help scientists, curators 


and humanities scholars to work together ona 
collection that includes clay tablets, papyri, 
manuscripts and e-mail archives at the John 
Rylands Research Institute in Manchester, UK, 
reveals its director, Peter Pormann, on page 318. 
Finally, on page 319, Clare Pettitt reassesses 
the multidisciplinary legacy of Richard Francis 
Burton — Victorian explorer, ethnographer, 
linguist and enthusiastic amateur natural 
scientist who got some things very wrong, 
but contributed vastly to knowledge of other 
cultures and continents. Today’s would-be 
interdisciplinary scientists can draw many les- 
sons from those of the past — and can take our 
polymathy quiz online at nature.com/inter. = 


sg INTERDISCIPLINARITY 


. A Nature special issue 
7 | nature.com/inter 
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Interdisciplinary work is considered crucial by 
scientists, policymakers and funders — but 
how widespread is it really, and what impact 
does it have? Scholars say that the concept is 
complex to define and measure, but efforts to 
map papers by the disciplines of the journals 
they appear in and by their citation patterns 
are — tentatively — revealing the growth and 


. ‘ influence of interdisciplinary research. 
An analysis reveals the extent and impact 
of research that bridges disciplines. INTERDISCIPLINARITY 


\y A Nature special issue 
7 | nature.com/inter 
al 


BY RICHARD VAN NOORDEN 


1@ Interdisciplinary research is on the rise 


REFERENCES RHETORIC 
Since the mid-1980s, research papers have increasingly cited work outside their own disciplines. Discourse about interdisciplinary research is 
The analysis shown here used journal names to assign more than 35 million papers in the Web of increasing. The fraction of papers that mention 
Science to 14 major conventional disciplines (such as biology or physics) and 143 specialities. The interdisciplinarity in their title has fluctuated, 
fraction of paper references that point to work in other disciplines is increasing in both the natural perhaps reflecting the priorities of funders, but 
and the social sciences. The fraction that points to another speciality in the same discipline (for the twenty-first century saw that proportion 
example, a genetics paper pointing to zoology) shows a slight decline. reach an all-time high. 

Natural sciences and engineering Social sciences Papers with “interdisciplinar*” in title (%) 


— Social sciences and humanities 
— Natural sciences and engineering 


0.05 = 
a 
o 
2 
2 
—— References within same speciality OO Teaeedoe rtieaemcnmacntmianiaiaitietiatn eg 
— References to other specialties in same discipline 
== References to other disciplines 
) 0) 
1950 2010 1950 2010 1950 2010 
: i a a oz 
Interdisciplinary research takes time to have an impact 
IMPACT Three years after publication: less impact Thirteen years after publication: more impact 


Citations decrease as a paper’s interdisciplinarity increases. Citations increase as a paper’s interdisciplinarity increases. 


Whether interdisciplinary 
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paper’s references point 
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contentious. Over three years, 
papers with diverse references 
tend to pick up fewer citations 
than the norm, but over 

13 years they gain more. Some 
studies suggest that a little 
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<a Some fields are more interdisciplinary than others... 
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...and so are some countries 


MOST INTERDISCIPLINARY COUNTRIES A separate analysis counted the proportion of a 
A 2015 study by researchers with the publisher Elsevier defined interdisciplinary papers as those paper’s references that are in other disciplines. 
that reference journals that are rarely cited together. The report looked only at countries that routinely — After totting up all the papers for each country, 
publish more than 30,000 papers per year to find the ‘most interdisciplinary’ countries for 2013. and normalizing the results (so that average 
interdisciplinarity = 1), similar nations emerge 


on top for 2013. 
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9.7% 9.7% 9.1% 3.59, 1. — China (1.09") 
2. India (1.07) 
3. Taiwan (1.06) 
4. Brazil (1.04) 
5. Australia (1.02) and South Korea (1.02) 


India Mainland Taiwan South Brazil Italy United Japan United Germany 
China Korea States Kingdom 19% higher than world’s average interdisciplinarity 


Publications in world’s top 10% 
of interdisciplinary papers (%) 
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Interdisciplinarity has become all 
the rage as scientists tackle society’s 
biggest problems. But there is still 


strong resistance to crossing borders. 


BY HEIDI LEDFORD 
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sking for US$40 million is never easy, but Theodore 
Brown knew his pitch would be a particularly tough 
sell. As vice-chancellor for research at the University 
of Illinois at Urbana-Champaign in the early 1980s, 
Brown had been tasked with soliciting a major dona- 
tion from wealthy chemist and entrepreneur Arnold Beckman, 
a graduate of the university. Beckman was hesitant, believing 
that the university should receive most of its support from the 
state. So Brown decided to devise a project like nothing he had 
ever seen before. 

In 1983, he and his colleagues put together a proposal for an 
institute that had little chance of being funded through normal 
channels. It would defy the powerful disciplinary cartography 
that defines many modern universities, bringing together 
members of different departments and inducing them to work 
together on common projects. Brown argued that it would allow 
faculty members to tackle bigger scientific and societal ques- 
tions than they normally could. 

“The problems challenging us today, the ones really worth 
working on, are complex, require sophisticated equipment and 
intellectual tools, and just don’t yield to a narrow approach,’ he 
says. “The traditional structure of university departments and 
colleges was not conducive to cooperative, interdisciplinary 
work,” 

It was an early example of the push for interdisciplinary 
research that is now sweeping universities around the globe. 
Although Brown was not completely alone — the interdiscipli- 
nary Santa Fe Institute in New Mexico was founded around the 
same time — he was advocating crossing boundaries before it 
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became fashionable. And his proposal met strong resistance. 
Department heads fretted that faculty members — and their 
grants — would be snatched away. Some colleagues scorned 
Brown's idea of creating open office spaces to foster interac- 
tions between graduate students: surely the din would make 
it impossible to get serious work done. And then there was 
the stigma. “Interdisciplinary research is for people who 
arent good enough to make it in their own field,’ an illustri- 
ous physicist chided. 

But Beckman liked the idea and committed the full 
$40-million asking price — at that time, the largest-ever 
private donation to a US public university. A few hectic 
years later, the 29,000-square-metre Beckman Institute for 
Advanced Science and Technology was born. 

The institute struggled to recruit a qualified director willing 
to take a chance on the new model, so Brown took the helm. 
Soon, large grants from organizations such as the Department 
of Defense and the National Science Foundation poured in, 
hushing many critics. By the time Brown left the institute in 
1993, other leading universities were sending delegations there 
to learn from the model. Researchers from Beckman — which 
now has more than 200 affiliated faculty members — have 
achieved attention-grabbing results, including helping to cre- 
ate one of the first graphical web browsers. 

Since the Beckman was founded, the interdisciplinary 
model has spread around the world, countering the trend 
towards specialization that had dominated science since the 
Second World War. Cross-cutting institutes have sprouted 
up in the United States, Europe, Japan, China and Australia, 
among other places, as researchers seek to solve complex 
problems such as climate change, sustainability and public- 
health issues. The interdisciplinary trend can be seen in pub- 
lication data, where more than one-third of the references in 
scientific papers now point to other disciplines (see page 306). 
“The problems in the world are not within-discipline prob- 
lems,” says Sharon Derry, an educational psychologist at the 
University of North Carolina at Chapel Hill who studies inter- 
disciplinarity. “We have to bring people with different kinds 
of skills and expertise together. No one has everything that’s 
needed to deal with the issues that we're facing” 

Even so, supporters of interdisciplinary research say that it 
has been slow to catch on, and those who do cross academic 
disciplines face major challenges when applying for grants, 
seeking promotions or submitting papers to high-impact 
journals. In many cases, scientists say, the trend is nothing 
more than a fashionable label. “There's a huge push to call 
your work interdisciplinary,’ says David Wood, a bioengineer 
at the University of Minnesota in Minneapolis. “But there’s 
still resistance to doing actual interdisciplinary science.” 


HIGHLY DISCIPLINED 


The idea of dividing academic inquiry into discrete categories 
dates back to Plato and Aristotle, but by the sixteenth century, 
Francis Bacon and other philosophers were mourning the 
fragmentation of knowledge. 

One problem lay in the rapid growth of science: there was 
too much information spread across the disciplines for any 
one person to handle. Science historian Peter Weingart of 
Bielefeld University in Germany 


the catalogue swelled from 10 pages to 2,300, covering 
7,000 species. 

In the nineteenth century, the disciplinary boundaries of 
the modern university started to take root. The disciplines 
surged in number and power after the Second World War, as 
nations, particularly the United States, boosted their research 
support. “It’s the moment when universities increased expo- 
nentially,” says Vincent Lariviére, an information scientist at 
the University of Montreal in Canada. “And the size of the 
university increased by creating more departments.” 

Tensions between the United States and the Soviet Union 
also played a part, says Weingart. The Soviets boasted a 
research programme geared towards solving societal prob- 
lems, for example improving agriculture to boost food secu- 
rity. By contrast, US President Dwight Eisenhower argued 
that basic research should be untethered. “In the field of intel- 
lectual exploration, true freedom can and must be practised? 
he said in a 1959 speech. And although basic research need 
not necessarily be disciplinary, it does not have the same 
pressure towards interdisciplinarity as does applied research. 


"WE RAVE TO BRING PEOPLE WITH DIFFERENT KINDS 
OF SKILLS AND EXPERTISE TOGETHER. NO ONE HAS 


EVERY TRING THAT S NEEDED, ° 


Specialities proliferated as individual disciplines were 
repeatedly subdivided. Biology was split into botany and 
zoology, then into evolutionary biology, molecular biology, 
microbiology, biochemistry, biophysics, bioengineering and 
more. Late last year, Jerry Jacobs, a sociologist at the Univer- 
sity of Pennsylvania in Philadelphia, counted the number of 
biology-related departments at Michigan State University in 
East Lansing. There were nearly 40. 

From this thicket, the term ‘interdisciplinary’ emerged. 
The earliest citation in the Oxford English Dictionary dates 
back to December 1937, in a sociology journal. But even at 
that time, some believed that the word was already over- 
used. In a report to the US Social Science Research Council 
in August that year, a sociologist at the University of Chicago 
in Illinois lumped ‘interdisciplinarity’ in with other “catch 
phrases and slogans which were not sufficiently critically 
examined” (R. Frank Items 40, 73-78; 1988). 

As an academic movement, interdisciplinarity caught on 
during the 1970s and has been growing ever since, says Lari- 
viere. He credits that rise in part to libraries, which began 
to stockpile subscriptions and improved researchers’ access 
to journals in alternative fields. A particle physicist could 
more easily browse biology journals, say. Furthermore, the 
US focus began to shift from basic research and scientific 
liberty back to societal problems such as environmental pro- 
tection, which can rarely be tackled by a single discipline. 

The United States was not alone: in 1994, an influential 
book partially sponsored by the Swedish Council for Plan- 
ning and Coordination of Research called The New Produc- 
tion of Knowledge (Sage) predicted, among other things, 
an increasingly interdisciplinary 


points to Carl Linnaeus’s taxo- 
nomic treatise Systema Naturae as 
an example: between its first edi- 
tion in 1735 and its last in 1768, 
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future as science seeks to solve 
socially relevant questions. That 
book had an impact, says Lariv- 
iére, particularly in the European 
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Union's Fifth Framework funding programme, which 
ran from 1998 to 2002 and emphasized interdisciplinary, 
problem-oriented research. 

Soon, interdisciplinary institutes began to sprout up 
around the world, each with its own unique structure and 
purpose. One of the first, the Santa Fe Institute, founded in 
1984, focused on applying advanced mathematics and com- 
putational skills to a range of disciplines. Others, such as 


“THERE IS CONSTANT PRESSURE ON ME 10 MAKE A 
CROSS-FACULTY, CROSS-INSTITUTION ALLIANCE. IF | 
WANT 10 BUILD ANEW BUILDING, THE MORE ALLIES 
HAVE, THE EASIER ITIS TO RAISE THE MONEY.” 


the Massachusetts Institute of Technology’s David H. Koch 
Institute for Integrative Cancer Research in Cambridge, or 
the neuroscience-focused Janelia Research Campus in Ash- 
burn, Virginia, tackle questions within a specific discipline 
but draw in work from other fields. And some, such as the 
Monash Sustainability Institute in Clayton, Australia, focus 
on specific problems. 

Even as the trend gained momentum, interdisciplinary 
researchers continued to hit the same hurdles that Brown 
had encountered. In 1998, chemist Richard Zare at Stanford 
University in California helped to launch the interdiscipli- 
nary institute Bio-X. But an influential colleague urged him 
not to move his lab into the Bio-X building. Doing so would 
essentially take Zare away from the chemistry department 
and his committee and teaching duties there, the colleague 
argued, weakening the department. 

Although he was well established, Zare worried about 
going against the establishment. “It was very serious,” he 
says. The risk is even greater for young professors seeking 
tenure, he notes. 

In 2004, in response to the growing interest in interdiscipli- 
nary work — and the challenges that face those who attempt 
it — the US National Academies released a report called 
Facilitating Interdisciplinary Research. The authors advised 
institutions to lower barriers, for example by making budgets 
flexible so that costs could be shared across departments. 

The publication drew a large audience. It has been down- 
loaded more than 7,600 times and had impact beyond US 
shores. At Durham University, UK, says physicist Tom 
McLeish, administrators referred to the report when they 
were forging a series of on-campus interdisciplinary centres. 
Around that time, McLeish was serving as pro-vice-chancel- 
lor of research, and saw interdisciplinarity as a way to make 
the small university shine on the world stage. He battled with 
department chairs who feared that the centres would reduce 
their budgets, and he worked to set up a promotion system 
that rewards investigators on large team grants in the same 
way as those on single-investigator grants. The university 
now has interdisciplinary centres on topics ranging from 
resilience — both ecological and psychological — to the 
history of medieval science. 

The interdisciplinary trend is also growing in Asia. In 
2000, the National Natural Science Foundation of China 
(NSEC) laid out a plan for interdisciplinary research, and 
universities have launched several cross-cutting centres 
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over the past decade, including the Academy for Advanced 
Interdisciplinary Studies at Peking University in Beijing. The 
NSEC plans to launch further interdisciplinary projects in the 
coming years, says Yonghe Zheng, deputy director-general of 
the foundation’s Bureau of Science Policy. “China is a devel- 
oping country,’ he says. “So the universities and institutes can 
quickly set up some new centres which reflect the new trend 
in interdisciplinary research.” 

Nanyang Technological University in Singapore estab- 
lished its Interdisciplinary Graduate School in 2012; it 
already has 335 students, out of a total graduate-school 
population of 2,000. Nanyang’s interdisciplinary graduate 
programme, which bills itself as the first of its kind in Asia, 
was designed in part to expand the university’s fundraising 
options, says Bo Liedberg, dean of the programme. Because 
industry is often focused on real-world problems that cross 
disciplines, an interdisciplinary programme could foster 
more collaborations with business, he reasons. 

That focus on interdisciplinarity as a revenue stream is 
widespread, says Merlin Crossley, a molecular biologist and 
dean of the faculty of life sciences at the University of New 
South Wales in Sydney, Australia. “There is constant pressure 
on me to make a cross-faculty, cross-institution alliance,’ he 
says. “If I want to build a new building, the more allies I have, 
the easier it is to raise the money.’ Arizona State University 
in Tempe saw its federal funding rise by 162% from 2003 to 
2012 as it promoted interdisciplinarity across its campus (see 
Nature 514, 292-294; 2014). 

Despite this pressure, interdisciplinarity’s reach remains 
modest. For every Nanyang or Durham, there are hundreds 
of universities that have not embraced significant change. 
Departmental dividers remain in place — and in power — at 
most institutions, says Nancy Andreasen, a neuroscientist at 
the University of lowa in Iowa City who co-chaired the com- 
mittee that wrote the National Academies report more than 
a decade ago. “It has been an enormous disappointment.” 


For institutions or programmes that have embraced 
interdisciplinarity, the transition has not always been easy. 
The most common mistake is underestimating the depth 
of commitment and personal relationships needed for a 
successful interdisciplinary project, says Laura Meagher, 
a consultant based near St Andrews, UK, who coaches 
interdisciplinary teams. “You see people who think it’s not 
much more than stapling a bunch of CVs to the back of a 
proposal,” she says. “They dont realize that it takes time to 
build a relationship,” 

When the push for collaboration comes from the top, some 
of that focus on personal relationships could be lost — leav- 
ing the project to suffer, she says. The UK Energy Research 
Centre (UKERC) in London, which since 2004 has coordi- 
nated and carried out sustainable-energy research, learned 
how delicate interdisciplinary relationships can be, says Mark 
Winskel, a social and political scientist at the University of 
Edinburgh who evaluated the centre's first decade. Its initial 
five-year phase went well, he says, and culminated in a key 
publication: Energy 2050, which synthesized the institution's 
results and translated them into recommendations. But the 
next five-year phase failed to produce a similar achievement. 

Winskel surveyed members and found that changes in the 
UKERC’s structure designed to open it to a wider commu- 
nity — for example by offering several rounds of fresh grants 
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in the middle of phase two — had upset some established 
long-term relationships. “We became a more diverse com- 
munity of scholars and disciplines,” he says. “But that also 
means you become less cohesive.” The UKERC learned from 
the experience: its third phase, launched in May 2014, aims to 
provide more stability for collaborative relationships. 

Social scientists in particular often face that lack of 
cohesion, says Thomas Heberlein, a social psychologist at the 
University of Wisconsin—Madison. When funders emphasize 
the societal impacts of the work they support, social scien- 
tists are often called in to assess the broader implications ofa 
project. But, he says, it is obvious — and insulting — whena 
social scientist is asked to join a project as a way to tick a box, 
without a true commitment to incorporating the discipline 
into the project. 


| SOCIAL STRUGGLE 


Several UK studies have found that social scientists are less 
likely than researchers in other disciplines to want to par- 
ticipate in interdisciplinary projects. For Heberlein, who 
has long collaborated with ecologists and environmental 
scientists, one of the stumbling blocks is what he calls “the 
hegemony of the natural sciences”. Those disciplines tend to 
be held in higher esteem than more qualitative fields such 
as the social sciences, and they are deemed more rigorous 
by funders and researchers, he says. That imbalance leads to 
frustration and undermines collaboration. Heberlein, whose 
speciality is in conducting surveys of public opinions, says 
that natural scientists often naively suggest that they can 
design and execute surveys themselves using an Internet 
tool such as SurveyMonkey. Heberlein disagrees: “It’s really 
hard to do the stuff we do,” he says. “Our measurements are 
complicated” 

Lack of respect can run in many directions when different 
kinds of researchers come together. Wood says that bio- 
engineers are always cautioned against having their grants 
reviewed by panels of biologists, who may be dismissive of 
engineering research goals and measurements. But he has 
also served on review panels in which engineers have recoiled 
at the limitations of clinical research. 

As more researchers become involved with interdiscipli- 
nary work, the mutual suspicion has started to ease. There 
have also been some signs of success in the funding arena. 
The US National Institutes of Health (NIH), for example, 
says that interdisciplinary proposals fare as well as, or slightly 
better than, more conventional applications. The European 
Research Council, by contrast, has noted that interdiscipli- 
nary grant proposals on average do not fare as well in review 
panels as projects that are narrower in scope. 

The atmosphere for publishing is also mixed. Interdiscipli- 
nary researchers have long complained that it is difficult to get 
their papers into top-tier disciplinary journals. Heberlein says 
that the rise of interdisciplinary journals has helped in his field, 
but he worries about the standard of some of the papers they 
publish. And he questions the wisdom of training graduate 
students across disciplines before they have immersed them- 
selves in the rigours of one area. “You've got to develop your 
disciplinary skills first” he says. “The bad news is the quality of 
this research is pretty bad and may be getting worse.” 

Many view the institutional push for interdisciplinar- 
ity as an experiment in progress. “The celebrations have 
begun, but the actual data on what kind of difference this 
makes are not in,” says Scott Frickel, a sociologist at Brown 


University in Providence, Rhode Island. 

As more institutions adopt new ways to organize research, 
some are also trying to rethink their assessment processes, 
says McLeisch. In July, he and his colleagues at Durham 
released a report called Evaluating Interdisciplinary Research, 
and he was surprised when academic societies and funders 
flocked to learn more. “We didn't anticipate that we'd be 
launching this report into an atmosphere where everyone 
wants to know this,’ he says. 

And the pace of change varies across the globe. In the 
United States, the NIH ran a programme to stimulate inter- 
disciplinary research from 2004 to 2012. It resulted in some 
changes, such as starting to recognize multiple principal 
investigators on what had been considered single-inves- 
tigator grants — a switch that removed a disincentive to 
collaborate. Since then, the agency has not perceived a need 
to follow up with any other incentives, noting that there are 
more than 4,000 active NIH-funded research projects that 
bill themselves as interdisciplinary. “Our general sense is that 
interdisciplinary research has become a very standard way 
of doing science,’ says Betsy Wilder, head of the NIH Office 
of Strategic Coordination. “It really pervades NIH funding” 

In some other countries, the experiment has just begun. 
Chemist Ayyappanpillai Ajayaghosh, director of the National 
Institute for Interdisciplinary Science and Technology in 
Thiruvananthapuram, India, says that momentum is building 
in his country to promote more interdisciplinary projects. In 
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"TOU SEE PEOPLE WHO THINK ITS NOT MOCH MORE 
THAN STAPLING A BUNCH OF CVS 10 THE BACK OF A 
PROPOSAL TEV DON'T REALIZE THAT IT TAKES TIME 


TO BUILD A RELATIONS RIP. 


Japan, theoretical physicist Tetsuo Hatsuda left the University 
of Tokyo in part because he felt that the boundaries between 
disciplines were too heavily enforced there. In 2013, he joined 
the RIKEN research institute in Wako, Japan, and launched 
an interdisciplinary team of theoretical physicists, chemists 
and biologists to work out techniques that will accelerate 
all three fields. He hopes that the effort will stimulate more 
interdisciplinary work in the country. “Japan is a little behind 
other countries,’ he says. “Theoretical science is a good start- 
ing point because it is easy for us to interact.” 

Some 25 years after it opened, the Beckman Institute's exper- 
iment in interdisciplinary research has been a success, says 
Brown. The centre continues to attract distinguished faculty 
members and large team grants — last year it won a research 
contract worth up to $12.7 million from the federal govern- 
ment’s Intelligence Advanced Research Projects Activity 
programme — even though competition for such money has 
increased as more universities build interdisciplinary teams. 

And Brown bristles at the suggestion that the global 
push for interdisciplinarity might be a fad. “The answer is 
a resounding ‘no,” he says. “Things have changed — now 
people focus on big problems, and if you go for a big problem 
you need to be interdisciplinary.” m SEE EDITORIAL P.289 


Heidi Ledford writes for Nature from Boston, 
Massachusetts. 
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Global funders to focus 
on interdisciplinarity 


Granting bodies need more data on how much they are spending on work 
that transcends disciplines, and to what end, explains Rick Rylance. 


r | Three arguments are often made in 

favour of interdisciplinary research. 

First, complex modern problems such 
as climate change and resource security are 
not amenable to single-discipline investiga- 
tion; they often require many types of exper- 
tise across the biological, physical and social 
disciplines. Second, discoveries are said to be 
more likely on the boundaries between fields, 
where the latest techniques, perspectives and 


insights can reorient or increase knowledge’. 
The influence of big-data science on many 
disciplines is a good example. Third, these 
encounters with others benefit single disci- 
plines, extending their horizons. 
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The arguments against interdisciplinary 
work are also familiar. Devotees of normal- 
ized citation measures often contend that 
interdisciplinary research is inferior. Some 
fear that it drains funds, time and energy 
from ‘core’ disciplines. Research funders 
often hear complaints that schemes targeted 
at interdisciplinarity distract researchers. 
There is a persistent argument that ‘you can't 
have inter- disciplines without disciplines: 
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> According to proponents of interdis- 
ciplinarity, obstacles abound. Academic 
institutions’ budgets, governance and pro- 
motion arrangements are usually organized 
around single disciplines, as are processes at 
many granting bodies and journals. Interdis- 
ciplinary research struggles for prestige — as 
measured by quantitative metrics that favour 
single disciplines — and it is trickier to peer 
review. Thus early-stage researchers are often 
advised that starting on an interdisciplinary 
trajectory is not a smart move. 

One striking aspect of this debate is how 
poor the consolidated data are on which to 
base judgements. This is why the Global 
Research Council (GRC) has selected 
interdisciplinarity as one of its two annual 
themes for an in-depth report, debate and 
statement between now and mid-2016. 
(The other is the position of women in sci- 
ence and research.) The GRC is a federation 
of more than 50 national research funders, 
with representatives from countries includ- 
ing Brazil, China, Japan, Russia, the United 
Kingdom and the United States. Participants 
include the US National Science Founda- 
tion, Research Councils UK (RCUK), Sci- 
ence Europe and the Chinese Academy of 
Sciences. I serve on the GRC’s governing 
board, in my capacity as chair of RCUK. 

As it has done in recent years with peer 
review and open access, the GRC aims 
to establish a common position on inter- 
disciplinarity — a topic on many people's 
minds worldwide, and one in which I havea 
personal interest. 


GROUND TRUTH 

So, what do we know? The 2014 Research 
Excellence Framework (REF) — a multi- 
year UK exercise that assessed universi- 
ties’ research strengths in 2008-13, and 
which thus determines funding — found 
that, when academics were asked to submit 
cases of research to REF that had significant 
impact outside academia, 80% were inter- 
disciplinary. However, items submitted to 
discipline-based REF panels under-repre- 
sented the quantity of top interdisciplinary 
research published by UK researchers in 
some fields’. These included health sciences, 
mathematics, information technology and 
the humanities. This is despite growth in UK 
interdisciplinary work overall. (The United 
Kingdoms share of the top 10% most inter- 
disciplinary research grew from 7.9% to 
9.1% in the four years to 2013.) In my view, 
this suggests that researchers perceive inter- 
disciplinary research to be vulnerable to 
discipline-based assessment. 

Further evidence comes from the UK 
government's recent triennial review of the 
country’s seven national research councils’. 
The review heard ‘evidence’ — what I con- 
sider opinion — to the effect that current 
structures did not serve interdisciplinary 


research well, and that it was significantly 
more difficult to gain funding for this than 
for mainstream activity. The review rec- 
ommended that RCUK — the councils’ 
umbrella body — investigate this, which it 
has been doing. 

It is difficult to get clear answers in 
response to the allegation that funding is 
more difficult to obtain for interdiscipli- 
nary work. Sample tests do not sustain the 
view that success rates for interdisciplinary 
grants are significantly adrift. But fund- 
ing data are not easily analysed in this way. 
This is in part because there are different 
schemes under which interdisciplinary work 
is undertaken: for example, through ‘grand 
challenge’-style programmes, fellowships or 
‘highlighted’ opportunities in mainstream 
schemes. Awards are also made in areas in 

which interdisci- 


“The generic plinarity is simply 

protocols of the norm, such as 
a scientific design. So, what 
paperandthose shouldbe included? 
for a piece of More fundamental, 
humanities however, is an issue 
research are of definition. What 
very different.” shouldbe measured 
when evaluating the 


funding of interdisciplinary activities? 

Arcane debates about whether research is 
inter-, multi-, trans-, cross- or post-discipli- 
nary complicate data collection. People also 
speak of methodological, theoretical, instru- 
mental, critical, restructuring and bridge- 
building interdisciplinarity*. I find this faintly 
theological hair-splitting unhelpful. But there 
are areas in which discrimination is impor- 
tant. One is the difference between ‘near- 
neighbour or ‘distant’ disciplines. 

Interdisciplinary research that involves 
neighbour disciplines is much more com- 
mon, and significantly easier to develop, 
than areas in which the disciplinary stretch 
is vast and the logistics and intellectual chal- 
lenge more demanding. This seems a signifi- 
cant point of analysis and one featured ina 
study” by the publisher Elsevier, which used 
a citation-based approach to review inter- 
disciplinarity in the United Kingdom. The 
measure considered the diversity of citations 
and the disciplinary distance between them 
to determine the extent of a paper's discipli- 
nary reach. The German Research Founda- 
tion (DFG) has used similar techniques for 
its funding portfolio, again demonstrating 
significant differences between ‘near’ and 
‘far’ interdisciplinarity — far research being 
more complex to undertake’. 


CASE STUDY 

Ihave personal experience of the challenges 
of interdisciplinary working. My back- 
ground is in English literature, but I have 
worked for many years on the history of psy- 
chology, in particular on the intersection of 
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mind and biomedical systems. Separately, I 
work with neurologists on what the brain is 
doing when a person reads complex verbal 
artefacts such as poems. This is tested 
experimentally using functional magnetic 
resonance imaging. 

My personal interest is in why, in brain- 
processing terms, might culture be good 
for you (if it is)? Clinicians have different — 
but compatible — concerns, for example in 
recovering advanced reading functions and 
well-being following head injury. Education- 
alists are interested in information process- 
ing and interpretation. 

Of my two areas of research — one his- 
torical, the other experimental — the first 
is not much ofa stretch, intellectually or 
methodologically. The second is. I had to 
learn new things: to work in a team, to work 
with complicated machinery, to observe 
ethical protocols and to raise money. I have 
had to acquire knowledge of brain anatomy 
and statistical analysis, and learn a different 
research mindset. This has been far from 
straightforward. It has meant, for instance, 
adjusting how I think about elementary 
issues such as ‘what constitutes sufficient, 
appropriate evidence?’; methods of analy- 
sis; how inferential conclusions can be sus- 
tained; and how to write up results. 

The generic protocols of a scientific 
paper and those for a piece of humanities 
research are very different. This is a matter 
both of how to express oneself and of the 
way the proposition is shaped in the first 
place. I have found that it is easy to be too 
‘arty for the scientist and too ‘sciencey’ for 
the arts researcher. A humanities colleague 
remarked that the statistics “might as well be 
in Russian”; a scientist asked why the poems 
we used in the neurology experiments were 
by different people (for example, Shake- 
speare and Milton): couldn't we just write 
our own for consistency? 

And then there is the question of serial 
investigation. The cycle of grant, paper, 
grant, paper and so on does not pertain in 
the humanities, in which articles tend to 
emerge from longer projects that culminate 
in a book. In my experience, issues about 
raising grants (from whom?), satisfying 
peer review (from which constituency?) and 
gaining career recognition are relevant. But 
paramount is confronting the groundwork 
challenges that come with interdisciplinary 
work — especially those that require ‘stretch’ 
— and doing so with integrity, honesty anda 
degree of disciplinary self-denial. 

There is evidence that the first steps in 
establishing interdisciplinary projects are 
crucial. This was a finding of a review’ of the 
European Union's efforts to stimulate inter- 
disciplinary work under its Fifth Framework 
Programme for research development. Pro- 
jects did not succeed as well as they might 
have because they did not facilitate ‘enabling 
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conversations from the outset and because 
they lacked coherent leadership. Interdis- 
ciplinary work requires particular skills, 
mindsets and attention to establishing 
common ground**. 


FACT FINDING 

Interdisciplinarity will be a headline topic 
at the GRC annual meeting in Delhi in 
May 2016, organized by India’s Science 
and Engineering Research Board and 
RCUK. A report on the state of play 
worldwide is being commissioned by 
RCUK, on behalf of the GRC (the team to 
undertake the research will be appointed 
in October). 

The report will survey current policy 
and practice among global research 
funders. What forms of support do 
they offer to interdisciplinary research? 
How and where is it done? What are its 
outputs and impacts? The survey will 
begin to establish base data on how inter- 
disciplinarity can best be stimulated and 
managed, and look for good practice 
in this most precious and complex of 
research endeavours. 

The GRC expects to issue a policy 
statement following this meeting, as it has 
done previously on topical areas. These 
documents focus and clarify attitudes 
on key subjects. They marshal data that 
can be used while national policies are 
established and international coopera- 
tion is developed. We need much bet- 
ter definitions of what kind of thing we 
are supporting when and if we support 
interdisciplinary research, and better 
intelligence about what works. = 


Rick Rylance is chief executive of the 
Arts and Humanities Research Council, 
chair of Research Councils UK, anda 
member of the governing board of the 
Global Research Council. 

e-mail: r.rylance@ahrc.ac.uk 
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Equipping cities to eather our changing climate takes many disciplines seortdng together. 


How to 
collab 


catalyse 
oration 


Turn the fraught flirtation between the social and 
biophysical sciences into fruitful partnerships 
with these five principles, urge Rebekah R. Brown, 
Ana Deletic and Tony H. F. Wong. 


n urgent push to bridge the divide 
Ame the biophysical and the 

social sciences is crucial. It is the only 
way to drive global sustainable development 
that delivers social inclusion, environmen- 
tal sustainability and economic prosperity’. 
Sustainability is the classic ‘wicked’ problem’, 
characterized by poorly defined require- 
ments, unclear boundaries and contested 
causes that no single agency or discipline is 
able to address’. 

It is crucial to understand, then, why so 
many well-meaning attempts at interdisci- 
plinary collaboration fail to deliver tangible 
outcomes — and why others succeed. Here 
we offer an unapologetically personal answer 
by reflecting on how, working across multi- 
ple faculties of Monash University in Mel- 
bourne, Australia, we have built a team of 
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disciplinary experts that delivers integrated 
and sustainable water management across 
multiple cities. 

We have now grown this interdisciplinary 
team to incorporate other institutions nation- 
ally and internationally. At the same time, we 
acknowledge that substantial transaction 
costs come with interdisciplinary research — 
it takes extra time and effort to make it work. 


PERSONAL JOURNEY 

Our journey began in the early 2000s, with 
two maturing groups working on urban 
water research: one in the faculty of engi- 
neering, focused on sustainable stormwater 
technologies, and the other in the faculty of 
arts, focused on urban water governance (see 
Supplementary Information; go.nature.com/ 
pjgbmn). The research teams had a common 
impact agenda, and our collaboration grew 
from a realization that an interdisciplinary 
approach would be more effective. In 2005, 
the two groups joined and secured funding 
for the establishment ofa Aus$4.5-million > 
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> (US$3.1-million) Facility for Advancing 
Water Biofiltration* that brought together 
more than 20 Monash researchers and PhD 
students across civil engineering, ecology 
and sociology. By 2012, this had culminated 
in the award of a Aus$120-million Coopera- 
tive Research Centre (CRC) for Water Sensi- 
tive Cities. It comprises a partnership of more 
than 85 organizations, including 13 research 
institutions, and around 230 researchers and 
PhD students from more than 20 disciplines 
and subdisciplines across the social and bio- 
physical sciences and humanities. 

Over the past decade, our collaborations 
have increasingly made a practical differ- 
ence. We produce regular synthesis docu- 
ments (see, for example, ref. 5) containing 
technology information and enabling policy 
advice, written in an accessible way to facili- 
tate engagement and uptake. These have 
been heavily used in policy and strategy doc- 
uments, which speeded up the adoption of 
our research. For example, stormwater reg- 
ulations introduced in the state of Victoria 
in 2006 were underpinned by our research, 
and other state and local governments in 
Australia have adopted our recommended 
performance targets for the management 
of urban run-off. As a consequence, our 
stormwater-biofiltration technology has 
been increasingly adopted in cities across 
Australia‘, Singapore, China and Israel. 
Since 2010, our expanded framework for 
integrated city-wide water-cycle manage- 
ment”* has been used by governments (such 
as those of Australia, Singapore and China) 
and international organizations (such as the 


Asian Development Bank) to guide their 
strategic planning and investment. 

In that time, we have had to resolve 
considerable tension, which hinders mean- 
ingful collaboration. The biophysical sci- 
ences tend to have well-agreed theories; the 
social sciences spend much time developing 
(and often disagreeing on) theoretical ques- 
tions. Both fields have control and compari- 
son at their core. But biophysical researchers 
mainly perform quantitative research (often 
in well-controlled and replicable laboratory 
conditions), whereas social science can be 
qualitative or quantitative, and also use 
interpretative validation approaches. 

We witnessed biophysical researchers 
accusing social scientists of poor rigour and 
of spending too much time conceptualizing 
problems without exploring and offering 
solutions. Conversely, social scientists were 
often frustrated that biophysical researchers 
were too focused on solutions, reductively 
overlooking the wider societal implications 
of their proposed solutions. 

This discord is exacerbated by an inherent 
cultural hierarchy that often privileges the 
biophysical over the social sciences. Environ- 
mental problems have typically been framed 
from a biophysical perspective, meaning that 
social scientists are not effectively engaged in 
developing integrated solutions’. 


FIVE PRINCIPLES 

The journey was not for everyone, and we lost 
some talent along the way. Yet many stayed 
on. How did we help academics to overcome 
these biases? We used these five principles. 


MAKE IT MAINSTREAM 


Forge a shared mission. Driving our 
collaborative journey was the shared mis- 
sion of delivering water-management strat- 
egies that address the challenges of floods, 
droughts and degraded waterways. This 
approach fosters more sustainable, resilient, 
productive and liveable cities — for a healthy 
planet and population. The shared mission 
provided a compelling account of the overall 
goal of the collaboration, included impact 
as a necessary outcome, and was sufficiently 
broad to incorporate meaningful roles for all 
disciplinary researchers involved. 

This mission also maintained a sense of 
purpose in the face of occasional failure and 
of the ongoing investment of huge time and 
effort to appreciate the norms, theories and 
approaches of other disciplines. When we 
needed the input of certain disciplines, and 
hastily included researchers that did not 
share the mission, it was not a success. The 
subsequent departure of these researchers 
from the team initially weakened the skill set 
of the group, but provided the motivation 
to expand our collaboration across multiple 
institutions. 


Develop ‘T-shaped’ researchers. In our 
experience, interdisciplinary collabora- 
tions have the greatest chance of success 
when researchers are “T-shaped’!” — able to 
cultivate both their own discipline, and to 
look beyond it. Breadth and depth are key. 
T-shaped researchers build credibility by aim- 
ing for the highest scientific contribution in 
their field — a point of particular importance 
for early-career researchers, whose prospects 


Ways to promote interdisciplinary research 


Funders 


in cross-disciplinary research and offer insights into the norms and 


@ Manage funding from an interdisciplinary perspective while 
reinforcing research impact. Discipline-based agencies must form 
joint funding programmes. 

@ Panels should include a balance of experts from the social and 
biophysical sciences, with a strong appreciation of other disciplines. 
It is also useful to include end-users of the research (for example, 
practioners and policymakers). 

@ Calls for funding should request balance between disciplines and 
prefer teams that have a proven record of collaboration. Publication 
in applicants’ own disciplines should be essential; publishing in other 
disciplines is desirable. 


Institutions 

@ Introduce key performance indicators that promote T-shaped 
researchers. For example, include qualitative measures of impact on 
policy and practice, as well as conventional academic indices. 

@ Identify institutional research strengths that show potential for 
interdisciplinary collaboration and incentivize it through seed grants. 
@ Reduce transaction costs: for example, through summer schools to 
develop constructive dialogue skills. Provide platforms — seminars, 
research workshops, debating competitions — to discuss challenges 
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cultures of other disciplines. Co-locate researchers from different 
disciplines who work on the same grand challenges. 

@ Invest in interdisciplinary PhD cohorts, co-supervised by academics 
from diverse departments or faculties. 


Publishers 

@ Invest in and create high-quality interdisciplinary journals, managed 
by editorial teams or boards of T-shaped researchers. 

@ Run special issues in high-impact, single-discipline journals that 
focus on interdisciplinary research. 

@ Peer reviewers should assess work using their disciplinary expertise, 
while being tasked to be open to innovations across disciplines. 


Researchers 

® Build stamina, patience and self-awareness to manage the long 
journey of establishing a productive interdisciplinary team. 

@ Put your best ideas forward even if they are unfinished, and be 
open to alternative perspectives from other disciplines, policymakers, 
industry practitioners and community members. 

® Prioritize depth early on, and embrace breadth by building 
relationships with those from other fields and practices. 
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for promotion are judged against research 
excellence criteria (see principle 5). T-shaped 
researchers also engage actively with other 
disciplines (see principle 3) to understand and 
appreciate their norms, theories, approaches 
and breakthroughs. 

Many believe that interdisciplinary 
research delays career progression or is the 
luxury of senior researchers. This has not 
been our experience: many of our research- 
ers were able to maintain a high publication 
rate in their own discipline, and — as part 
of a team — secure increasing interdisci- 
plinary research funding. However, it took 
nearly five years to start publishing our joint 
interdisciplinary research in high-impact 
journals. 


Nurture constructive dialogue. Through a 
decade of trial and error, we have invested 
heavily in creating the environment and 
informal rules that empower researchers 
across all sciences to engage effectively, 
despite their vastly different approaches to 
research design and methodology, and their 
differing technical vocabularies and com- 
munication cultures. 

This has involved some commitments: to 
interact in plain English (disciplinary jargon 
is frowned on); to foster empathy and respect 
for different disciplinary norms; and to reflect 
on what is working in collaborative interac- 
tions. We designed regular interdisciplinary 
forums using these rules. This led to the co- 
development of key publications — for exam- 
ple, through interdisciplinary workshops, we 
have jointly written three annual reports for 
policymakers and water practitioners’. These 
activities grew into a sought-after annual 
short course and a massive open online 
course (MOOC) showcasing different disci- 
plinary approaches to urban water challenges. 

Reaching the ideal of constructive com- 
munication across the sciences takes time 
and practice — researchers new to the 
group may not yet have the necessary skills. 
Typically, they pass through three stages of 
development (see ‘Journey to T’). Initially, 
new collaborators tend to dominate dis- 
cussions and assert the primacy of their 
discipline. Soon after, they recognize the 
importance of other disciplines and adopt 
amore passive demeanour. Eventually, the 
researchers settle into a space of construc- 
tive dialogue. 

We find that some quit and others stay 
to become mature collaborators, able to 
co-create across academic disciplines and 
broader networks. The role of more expe- 
rienced collaborators is to support new 
colleagues’ personal journeys into these 
dynamic relationships. 


Give institutional support. Academic 
career pathways for interdisciplinary 
research are essential if it is to attract and 


Dominance (high) 


Listening OW)! | Nurture nascent skills in 


safe learning environments, 
interdisciplinary forums, 
synthesis workshops and 
writing groups. 


Behaviour 


JOURNEY TO T 


Researchers who are new to working with people 
from other disciplines oscillate between asserting 
the primacy of their own field and hanging back. 
With time they can become capable of breadth 
and depth (T-shaped), and able to engage in 
constructive dialogue and co-creation. 


dialogue 


Passivity (high) 


Support dynamic learning with 
informal rules such as plain 
speaking, open-mindedness, 
empathy and respect. 


Experienced researchers develop 
the skills for interdisciplinary 
working in enduring partnerships 
towards shared goals. 


Listening (high) 


retain the brightest and best. Monash 
University’s senior leadership team con- 
sistently signalled that it values research 
that is interdisciplinary, attracts significant 
industry involvement and delivers real- 
world impact — despite the organizational 
structures and global academic norms that 
are biased towards more conventional, dis- 
ciplinary approaches. 

This value was communicated to research- 
ers through university policies, promotion 
criteria and seed-funding programmes. 
For example, the engineering faculty has 
introduced qualitative research standards 
(alongside the conventional quantitative 

measures), that 


“Despite our attempt to meas- 
rewarding ure the a of 
experience, ae a se 
interdisciplinary ‘he facu : 
research is of engineering an 
; arts now award 
stillon the ra 
Sag 2? small competitive 
margins. 


grants to teams 
from both facul- 
ties to catalyse collaborations. 

Monash has established a PhD pro- 
gramme for cohorts of students working on 
a common global challenge across a number 
of disciplines; for instance, sustainable urban 
water management in developing Asian 
cities. These groups work in a constructive 
dialogue environment. 


Bridge research, policy and practice. 
Finally, the establishment of enduring 
connections between researchers, policy- 
makers and industry practitioners proved 
to be an important driver in growing our 
interdisciplinary collaborations. Refresh- 
ingly, industry rarely thinks in disciplinary 
silos. They tend to tackle complex problems 
from a range of perspectives, thereby model- 
ling integrated, solution-focused thinking. 
To ensure real-world impact, we engaged 


policy and industry partners in the design 
of our research programme and encouraged 
them to critique our scientific approach and 
presentation of results. We also ran frequent 
events that allowed professionals from policy 
and industry to interact with researchers. 
For example, in 2008, through a national 
roadshow, we showcased how our research is 
addressing crucial water challenges around 
Australian cities. Aimed at policymakers and 
industry and community leaders, it stimu- 
lated research and partnerships. 

Despite our rewarding experience, inter- 
disciplinary research is still on the margins. 
We urge researchers, institutions, and funding 
bodies committed to sustainable develop- 
ment to make it mainstream (see “Ways to 
promote interdisciplinary researcl’). m 


Rebekah R. Brown, Ana Deletic and 
Tony H. F. Wong are at Monash University 
in Melbourne, Australia, and in the 
Cooperative Research Centre for Water 
Sensitive Cities. R.R.B. is also director of the 
Monash Sustainability Institute. 

e-mail: rebekah.brown@monash.edu 
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INTERDISCIPLINARITY 


Inside Manchester’s ‘arts lab’ 


Peter E. Pormann on the revelations a meshing of technology and humanities can yield. 


igital pioneer Steve Jobs delivered a 
D potent commencement address at 

Stanford University, California, in 
2005. He described how, as an undergradu- 
ate, he had studied calligraphy rather than 
his prescribed curriculum (he later dropped 
out). Calligraphy may have seemed at the 
time to have no practical application, but a 
decade later, when Jobs was working on the 
Mac, it enabled him to promote proportional 
fonts and establish Apple as the gold standard 
in desktop publishing. Jobs fruitfully com- 
bined the “liberal arts and technology” — a 
phrase he used repeatedly in his last keynote 
addresses before his death in 2011. 

Productive interaction between the arts 
and sciences is at the heart of the John 
Rylands Research Institute at the University 
of Manchester, UK. Founded in April 2013, 
the institute (which I direct with associ- 
ate director and head of special collections 
Rachel Beckett) now has a staff of more 
than two dozen. It brings together scientists, 
conservators, curators, digital-imaging spe- 
cialists and humanities scholars to unravel, 
reveal and realize the research potential of 
the University of Manchester Library’s spe- 
cial collections. These run from clay tablets 
to e-mail archives. Highlights include Greek, 
Coptic and Arabic papyri, medieval Hebrew 
and Persian manuscripts and early-modern 
printed books — such as one of the world’s 
finest collections of volumes printed by 
Renaissance humanist Aldus Manutius. The 
institute was established in response to the 
rise of digital humanities, a field that enables 
the study of books and manuscripts in ways 
that were unimaginable a generation ago. 

There have been triumphs and tribula- 
tions. We have raised more than £3 million 
(US$4.6 million) in funding from sources 
such as the British Academy and biomedi- 
cal-research charity the Wellcome Trust. The 
institute sits in the already-crowded John 
Rylands Library, where its rapid growth is a 
challenge. But our ‘arts lab’ is taking research 
into uncharted territories by shattering 
disciplinary and institutional divisions. 

To make complex collaborations work, we 
instigated a buddy system. All researchers — 
PhD students, postdocs, visiting academ- 
ics and colleagues with funding for pilot 
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Erased text in the Syriac Galen Palimpsest is made visible by multispectral-image analysis. 


studies — are allocated a curator with inti- 
mate knowledge of the materials they study. 
Art-history postdoc Elizabeth Savage, for 
instance, won a three-year early-career 
fellowship from the British Academy to 
study thousands of fifteenth- and sixteenth- 
century prints collected by Hiero von Hol- 
torp, a nineteenth-century scholar of early 
printing technology and aesthetics. Her 
buddy is visual-collections manager Stella 
Halkyard, who helped to rediscover this 
remarkable legacy. Savage also works with 
colleagues at the library’s Centre for Heritage 
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Imaging and Collection Care (CHICC), 
who pioneer innovations in colour print 
photography, such as lighting techniques 
for imaging gold. Combined with close-ups 
of pigments, these techniques have helped 
Savage to identify some of the earliest exam- 
ples of printed gold ink. 

Work at the CHICC is also revolutionizing 
understanding of papyri and palimpsests — 
manuscripts from which text has been erased 
to allow reuse of the page. Researchers have 
made detailed images of artefacts using 
cutting-edge technology: a 60-million-pixel 
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digital sensor, combined with a MegaVision 
EV LED illumination system. This com- 
bines high-resolution photography with 
multispectral imaging, which captures data 
at frequencies across the electromagnetic 
spectrum. It can reveal once-unreadable 
texts, because different inks reflect light in 
different spectra differently. Thus papyrolo- 
gist Roberta Mazza has discovered the ‘Last 
Supper amulet; a papyrus with biblical pas- 
sages on one side and a grain-tax receipt on 
the other. Mazza traced its provenance to 
near ancient Hermoupolis in Egypt, close to 
modern Al Ashmunayn. 

We are also collaborating with scientists 
including Mark Dickinson, a physicist and 
medical-imaging specialist at Manchester's 
Photon Science Institute. Medical imaging is 
rich in techniques that can be used to analyse 
artefacts, such as optical coherence tomogra- 
phy, which is usually harnessed for imaging 
tissue or visualizing blood flow. Dickinson 
has tested it on carbonized papyri too deli- 
cate to unroll, revealing hidden text. 

Also key to investigating the collections 
is image analysis. We are using statistical 
techniques such as canonical variate analysis 
(CVA), which compares group structures 
in multivariate data, to read erased text on 
palimpsests. CVA is applied to a multispectral 
image and an algorithm is trained to recog- 
nize overlying text, the erased underlying text 
and areas where the two coincide. This effec- 
tively maximizes the contrast, so the under- 
text ‘pops’ out and becomes more readable. 

A £1-million image-analysis project that 
grew partly out of a collaboration with the 


CHICC and has received funding from the 
UK Arts and Humanities Research Council 
is studying the Syriac Galen Palimpsest. This 
is an eleventh-century liturgical work that 
carries an erased sixth-century undertext 
— a Syriac translation of On Simple Drugs 
by the classical physician Galen (around 
AD 129-216). We already had a large data 
set of multispectral images; now images of 
the same page are 


being combined to “The nature of 
make the under- the institute 
text more leg- binds ancient 
ible (see picture). artefacts to 


state-of -the- 
art science.” 


Overseeing this 
is computational 
primatologist Bill 
Sellers, who ordinarily uses computer mod- 
elling to reconstruct the movements and 
evolution of extinct species. 

All of this work generates large sets of 
images, stored as TIFF files. These raise the 
question of how to store and analyse big data. 
A challenge will be establishing integrated 
systems to allow comparative research across 
platforms. For Greek papyri and Hebrew and 
Persian manuscripts, we plan to develop solu- 
tions with the Cambridge Digital Library; this 
will feed into the iLibrary strategy to bring 
our digital collections and projects under 
one roof. We can also look at large amounts 
of texts and metadata with the tools of com- 
putational corpus linguistics — which studies 
language through samples of real text — and 
text mining, which hunts through text to 
extract data. One such tool is the language- 
processing software system U-Compare. 
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Some of our collections are born digital 
— for example, we hold the e-mail archives 
of local literary publishing house Carcanet 
— and future researchers will undoubtedly 
approach these differently from how they 
look at hand-written correspondence. We 
have begun to collaborate with computa- 
tional linguists at Manchester’s National 
Centre for Text Mining, as well as colleagues 
at the nearby Centre for Translation and 
Intercultural Studies, who have vast experi- 
ence with large sets of multilingual texts. And 
with palaeography — the study of ancient 
handwritings, their dating and their classi- 
fication — artificial intelligence might offer 
research avenues that the institute is keen to 
explore. By training software to recognize 
certain hands and writing styles, one might 
be able to query vast virtual collections of 
manuscripts in unprecedented ways. 

Delivering the institute’s inaugural lecture, 
historian Ann Blair of Harvard University in 
Cambridge, Massachusetts, said: “In embrac- 
ing new media, we must never discard the 
old ones.” The interdisciplinary nature of 
the institute is its signature, the tie that binds 
ancient artefacts to state-of-the-art science. 
These form a dual legacy for future genera- 
tions, who will want to ask different ques- 
tions of the library’s remarkable holdings. m 


Peter E. Pormann is founding director of 
the John Rylands Research Institute at the 
University of Manchester, UK, and principal 
investigator on the Syriac Galen Palimpsest 
project. 
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ANTHROPOLOGY 


One-man multidisciplinarian 


Clare Pettitt reassesses the legacy of Victorian polymath Richard Francis Burton. 


ichard Francis Burton (1821-90) 
Res for and mastered knowledge 

in so many fields — from geography 
to sexology — that his real legacy for science 
is muddied. The flamboyant polymath was 
an eminent explorer, a pioneer of ethnog- 
raphy and a linguist fluent in more than 
25 languages (from Arabic to Swahili) and 
a number of dialects. He wrote or trans- 
lated more than 40 volumes, including The 
Lake Regions of Central Africa, published 
155 years ago, and the first English edition 
of The Arabian Nights (1885). He was also 
an enthusiastic amateur of botany, geology 
and zoology, even running an experiment 
on monkey communication while living 
in Sindh (now Pakistan). Overall, this furi- 
ously energetic multidisciplinarian both 


contributed vastly to knowledge of other 
cultures and continents, and sometimes 
misread them to his — and their — cost. 
These complex interests were the fruit 
of a turbulent mind. The eldest son of an 
army family, Burton had a protean character 
shaped on the road as his parents moved their 
young family restlessly around France and 
Italy. He started to learn Latin at three years 
old and Greek at four, and quickly picked 
up French, Italian and local dialects. At the 
University of Oxford, UK, contemptuous of 
the teaching methods, he honed his mastery 
of languages but was expelled for attending 
a steeplechase. He was soon propelled into 
the Bombay Infantry and immersed himself 
in Indian languages and culture. Violent and 
mesmerizing by turns, he was viewed as both 


prodigiously gifted and morally suspect by 
his contemporaries — as an ‘other; just as he 
himself was possessed by otherness. 

By 1853, Burton had turned to explora- 
tion. Still beset by inner conflicts, he could 
also attract conflict with others. His great 
1856-59 expedition to East Africa with John 
Hanning Speke, instigated by the Royal Geo- 
graphical Society in London, was a case in 
point. It made “formidable contributions to 
imperial knowledge production’, according 
to historian Adrian Wisnicki. Although both 
men were seriously disabled by disease, Bur- 
ton became the first European to see Lake 
Tanganyika. He kept dense geographical and 
cultural notes and meteorological records, 
and collected specimens for what are now 
the Royal Botanic Gardens, Kew, and the > 
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> Royal School of Mines in London. But the 
expedition led to a bitter rivalry between the 
two over the source of the Nile, with Speke 
claiming it as the lake that he dubbed Lake 
Victoria, and Burton feeling that the evidence 
failed to add up. Long after their return, 
in 1864, the British Association for the 
Advancement of Science called for a debate 
in London, but Speke died of an unexplained 
gunshot wound the day before. “The charita- 
ble say that he shot himself, the uncharitable 
say that I shot him,” Burton wrote to a friend. 

Burton was shocked, but published The 
Nile Basin that year, reiterating his position 
in the Nile contro- 


versy first detailed “Burton’s : 

in The Lake Regions UNtimerstonin 
of Central Africa. © multitude of 
Burton felt that languages and 
Speke’s account, cultures gave 
Journal of the Dis- hima unique 
covery of the Source perspective on 
of the Nile (1863) humanity.” 
had dressed Africa 


up in flowery, fundamentally unscientific 
rhetoric, claiming for instance that a mass 
of dirty huts (in Burton’s words) was a vil- 
lage built on the most luxurious principles. 
Burton insisted on using indigenous names 
and learnt local languages so that he could 
communicate directly with people he met — 
and his investigations would prove invaluable 
to future explorers. “I undertook the history 
and the ethnography, the languages, and the 
peculiarities of the people,’ he is quoted as 
saying, adding scornfully that to Speke “fell 
the arduous task of delineating an exact 
topography”. Geography, Burton established, 
was a social as well as a physical science. The 
explorer Henry Morton Stanley would prove 
in 1875 that Speke had correctly identified 
the source of the Nile, but he used Burton's 
notes to get there. As Burton put it in Zan- 
zibar; City, Island, and Coast (1872), future 
expeditions “had only to tread in my steps”. 
Throughouta life of trailblazing travel and 
diplomacy — from Somaliland to Benin, Ara- 
bia, the Middle East, Asia and the Americas 
— Burton’ first epistemological framework 
for colonial encounters was the ‘Oriental- 
ist’ one of linguistic scholarship. But as an 
ethnographer, he was original. He mingled 
with the people whose cultures he studied, 
understanding that knowledge is embod- 
ied and must be historically contextualized. 
This was criticized in Victorian England, 
with its horror of ‘going native} but places 
him ahead of his time. Burton was always 
quick to acknowledge the contingencies 
and accidents that brought him into contact 
with local people, and never tried to efface 
himself from his narrative. Only in the late 
twentieth century did anthropologists such 
as John and Jean Comaroff suggest that the 
obvious weaknesses of ethnography as a 
‘science are also its strengths, as “participant 


Ethnographic pioneer and explorer Richard Francis Burton, photographed around 1860. 


observation ... connotes the inseparability of 
knowledge from its knower”. Studies from the 
1970s onwards supported this view, including 
Annette Weiner’s The Trobrianders of Papua 
New Guinea (Holt, Rinehart and Winston, 
1988), a reappraisal of Bronislaw Malinowski’s 
study of the Pacific Trobriand Islands, Argo- 
nauts of the Western Pacific (Routledge and 
Kegan Paul, 1922). 

In other ways, and much less attractively, 
Burton was very much of his time. His respect 
for Muslim culture did not preclude his suc- 
cumbing temporarily to a vicious racism that 
became particularly extreme in the 1860s and 
cannot be exonerated. By the mid-1860s he 
had become one of Britain’s foremost prom- 
ulgators of the polygenist thesis that Africans 
constituted a distinct and inferior species, and 
he helped to found the Anthropological Soci- 
ety of London, established after a dispute with 
the monogenist Ethnological Society. By his 
last decade, Burton had come to his senses, 
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embracing the view that all of civilization 
came from Africa, and felt that “negroes... 
have shown themselves fully equal in intel- 
lect and capacity to the white races of Europe 
and America’. But the damage had been done. 

Despite this sorry chapter, Burton’s 
immersion in a multitude of languages and 
cultures gave him a unique perspective on 
humanity, with “the enormous advantage 
of being capable of comparing native with 
foreign ideas and views of the world”. He 
knew that other cultures could never be 
fully ‘translated’ or subsumed into English, 
and that this militated against the ethos of 
Empire. He was perhaps less Orientalist than 
comparativist and relativist. His contribu- 
tion to the fledgling social sciences was all 
the more powerful, perhaps, for having been 
fed by so many streams of knowledge, even if 
this makes it less visible to us today. = 


Clare Pettitt is professor of nineteenth- 
century literature and culture at King’ College 
London. She is the author of Dr Livingstone, I 
Presume? and many articles about exploration 
and travel in Victorian print culture. 

e-mail: clare.pettitt@kcl.ac.uk 
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New environment 
law shows its fangs 


China's revised Environmental 
Protection Law went into effect 
on 1 January this year. Severe 
punishments for polluting 
businesses swiftly followed. 

Some 292 cases incurred 
an accumulating daily fine 
within the first 6 months, 
totalling 236 million yuan 
(US$37 million). The highest 
single levy was 15.8 million 
yuan (data from the Ministry of 
Environmental Protection; see 
www.mep.gov.cn). Over the same 
period, production was curtailed 
in 1,092 cases and equipment was 
locked down in 1,814 instances. 
Criminal charges were brought 
against 740 polluting businesses, 
and 782 were punished with 
police administrative detention. 

Local governments are 
cooperating with the new law, 
contrary to earlier misgivings (see 
B. Zhang and C. Cao Nature 517, 
433-434; 2015 and H. Yang et al. 
Science 347, 834-835; 2015). In 
Linyi in Shandong province, for 
example, several dozen businesses 
(including some responsible for 
high employment and large tax 
revenues) have been closed down. 
Dasheng Liu Shandong Institute 
of Environmental Science, Jinan, 
China. 
liu_sdiep@126.com 


Tailor checklists to 
clinical teams 


The problems of replicating the 
effects of patient-safety checklist 
trials in routine practice could 
be mitigated by adapting 
checklists for individual hospital 
environments and teams (see 
Nature 523, 516-518; 2015). An 
F-16 fighter aircraft would not 
rely on a checklist devised for 
flying a jumbo jet. 

For instance, much of the 
World Health Organization's 
surgical safety checklist 
is irrelevant to a cardiac 
catheterization procedure. There 
is no general anaesthetic or 
expected blood loss, for example, 


but monitoring kidney function 
is crucial. We therefore designed 
a bespoke safety checklist to brief 
the cardiac clinical team on the 
planned procedure and on any 
potential problems. Endorsed 

by the British Cardiovascular 
Society (www.bcs.com/ 
checklist), the checklist is 
regularly modified in response to 
end-user evaluation. 

Smart electronic checklists 
will further improve safety by 
highlighting patient-specific 
risks and acting as a guide in 
emergencies and for auditing 
near-misses. 

Thomas J. Cahill Oxford 
University Hospitals NHS Trust, 
Oxford, UK. 

Rod Stables Liverpool Heart and 
Chest Hospital, Liverpool, UK. 
thomas.cahill@cardiov.ox.ac.uk 


Mining shell waste 
will not be easy 


If the chemical industry is 

to profit from refining waste 
crustacean shells and other 
by-products of seafood 
processing, collection problems 
and food-safety issues need 

to be overcome (see N. Yan 

and X. Chen Nature 524, 
155-157; 2015). 

Gathering sufficient animal 
feedstock for commercial 
purposes will be a formidable 
challenge (R. L. Naylor et al. Proc. 
Natl Acad. Sci. USA 106, 15103- 
15110; 2009). The transport and 
storage of seafood by-products 
from different processing plants is 
also likely to be extremely costly. 

Moreover, expensive energy- 
intensive drying of crustacean 
shells would be necessary to 
prevent microbial growth and 
production of carcinogenic 
bacterial aflatoxins. Other 
health risks could arise from 
bioaccumulation of contaminants 
(such as heavy metals in shells) or 
from cross-species transmission 
of pathogens and perhaps even 
of prions through the food 
chain (L. Cao et al. Science 347, 
133-135; 2015). 

Hong-Wei Xiao, Zhen-Jiang Gao 


China Agricultural University, 
Beijing, China. 

A.S. Mujumdar McGill 
University, Quebec, Canada. 
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Seal of approval for 
ocean observations 


We announce that the Pacific 
Islands Ocean Observing System 
was certified last month as the 
first regional partner to attain full 
membership of the US Integrated 
Ocean Observing System (IOOS). 
This certification is a hallmark of 
the quality of data provided by the 
IOOS, to the benefit of the public, 
the private sector and individuals. 

It is also an indicator to the 
global community that IOOS 
regional partners providing data 
from the oceans, Great Lakes and 
coasts of North America have 
met rigorous criteria for system 
oversight, information security, 
public engagement and financial 
controls. 

The IOOS includes federal 
and non-federal partners in an 
interagency investment by the 
US government of more than 
US$2 billion annually for the 
collection and provision of ocean 
data and for improved forecast 
capabilities. It comprises about 
10,000 unique oceanographic 
data sets and some 4,000 services 
that provide data, metadata 
and refined data products to 
tens of millions of US users. For 
instance, IOOS data are used in 
search-and-rescue operations 
and to ensure safe operation of 
commercial vessels. 

Certified IOOS data can 
be entered in the permanent 
US archive at the National 
Centers for Environmental 
Information and can be used 
internationally by the Global 
Telecommunication System for 
meteorological data. 

Chris E. Ostrander University 
of Hawaii at Manoa, Honolulu, 
Hawaii, USA. 
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Lack of help stymied 
community care 


John Foot'’s book on psychiatrist 
Franco Basaglias movement to 
reform Italy’s psychiatric hospitals 
ends with the passing of Law 180 
in 1978 to close down asylums 
(see A. Tone Nature 524, 290; 
2015). Sadly, the law was poorly 
implemented owing to woefully 
inadequate resources. 

Families received little or no 
support in caring for those who 
returned home. For some it 
was too much, forcing general 
hospitals to take up the slack. 
Psychiatrists found their hands 
tied when confronted with people 
who were seriously mentally 
ill, so many ended up in prison 
stigmatized as criminals. 

Even Basaglia’s widow, Franca 
Ongaro Basaglia — a core 
member of the reform movement 
and later an Italian senator — 
described Law 180 as a failure. 
Laura Spinney Paris, France. 
Ifspinney@gmail.com 


Education reforms 
ring true 50 years on 


Stephen Bradforth and colleagues’ 
discussion of what is needed 
to develop “a science-literate 
population” (Nature 523, 282-284; 
2015) echoes the words of a Nature 
editorial 50 years ago, entitled 
‘New thinking in undergraduate 
teaching’ (Nature 205, 835; 1965). 
According to the editorial, 
“the student is in danger of 
spending too much of his [sic] 
limited time memorizing facts, 
and has insufficient time at his 
disposal to master the principles 
underlying his subject and to 
develop his powers of thought” It 
continues: “the most important 
purpose ofa university education 
is to teach the student to think 
for himself... it may on occasion 
demand a re-examination of the 
whole approach to a subject in 
undergraduate courses.” Indeed. 
Barry S. Winkler Eye Research 
Institute, Oakland University, 
Rochester, Michigan, USA. 
winkler@oakland.edu 
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Decompensated cirrhosis and microbiome 


interpretation 


ARISING FROM N. Qin et a/. Nature 513, 59-64 (2014); doi:10.1038/nature13568 


The diagnosis of cirrhosis, especially in the advanced/decompensated 
stages, is made using simple and inexpensive clinico-radiologic- 
pathological techniques’. Qin et al.’, whose paper has replicated 
prior studies*~*, reported a relatively novel profile to diagnose cirrho- 
sis using complex stool metagenomics despite having a majority (65% 
discovery and 76% validation cohorts) decompensated cirrhotic 
population. We have found that the decompensated cirrhosis cohort, 
which does not require these complicated diagnostic strategies, was 
responsible for a significant proportion of these microbiota changes 
on further analysis of their metagenomics data and using a new cohort 
of 360 subjects. Therefore, given several confounders and the ease of 
decompensated cirrhosis diagnosis using current techniques, a careful 
re-interpretation of newer microbiota-based diagnostic strategies 
that do not a priori differentiate between early (compensated) and 
decompensated cirrhosis and treat all people with cirrhosis as one 
uniform population should be performed. There is a Reply to this 
Brief Communication Arising by Qin, N. et al. Nature 525, http:// 
dx.doi.org/10.1038/nature14852 (2015). 

A major confounder in people with cirrhosis are standard of care 
therapies such as lactulose, rifaximin, antibiotics and acid-suppressants 
that can affect the gut milieu’®. These alone could explain a large portion 
of the metagenomics changes and have not been accounted for*’”. 
These medications, especially proton pump inhibitors, could also be a 
major reason why oral origin bacteria are found in the intestine, as has 
been shown in prospective cirrhotic and non-cirrhotic studies’*”. 

We hypothesized that there was a significant difference in compen- 
sated versus decompensated cirrhotic microbiota in Qin et al.’, which 
needs to be accounted for in the interpretation. Using 66 enriched/ 
depleted metagenomic sequences (MGS) provided by S. D. Ehrlich, 
we performed linear discriminant analysis (LDA) effect size (LEfSe)'” 
after classifying them into healthy, compensated and decompensated 
subjects. LEFSe uses a factorial Kruskal-Wallis and LDA test to 
detect features with significant differential abundance. We found that 
even in the selected data set the authors provided, 17 of 66 MGS were 
different between compensated and decompensated groups (10 MGS 
overexpressed and 7 MGS underexpressed, Fig. 1a). These included 
several oral origin species (Streptococcus oralis and several Veillonella 
spp.), which were the primary study results. We then enrolled 360 age- 
matched subjects (45 healthy individuals (age 54+3 years, no 
chronic diseases), 171 compensated (age 54+4 years, median 
Child-Pugh score 6) and 141 decompensated cirrhotic patients (age 
55 +2 years, median Child-Pugh score 9)) for stool multi-tagged 
pyrosequencing (MTPS)". Using Kruskal-Wallis analysis of relative 
microbial family abundance >1%, we found that compensated 
and decompensated patients were significantly different (Fig. 1b). 
Proteobacteria levels, specifically Enterobacteriaceae, were signifi- 
cantly higher in decompensated cirrhotic patients. This pattern is also 
seen in other recent MTPS studies*"*. Although MGS and MTPS are 
not completely comparable, it is interesting that both resulted in 
similar conclusions. Therefore, there are significant microbiota dif- 
ferences between compensated and decompensated patients that need 
to be separated in cirrhosis microbial studies. 

In addition, in Qin et al.” the calculation of the model for end-stage 
liver disease (MELD) score in Supplementary Table 1 is inaccurate, 
casting doubt on figure 2. The authors compared diabetes patients 


with cirrhotic patients to inform their cirrhosis-associated profile. 
However, diabetes is prevalent and is associated with a poor prognosis 
in cirrhosis’. Therefore these results are not generalizable to patients 
with cirrhosis and diabetes. 

The present need is not for complicated profiles that are unlikely to 
supplant currently available simple diagnostic strategies, but rather 
for improving prognostication. This is because gut microbiota are 
associated with several cirrhosis-related pre-terminal events such as 
hepatic encephalopathy and infections’. A prior study has shown that 
altered stool microbiota can predict poor outcomes, but further work 
is required’. 
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Figure 1 | Microbiota distribution between compensated and 
decompensated cirrhotic subjects. a, LFSe plot showing metagenomic species 
that are overexpressed (green) and under-expressed (red) in decompensated 
compared to compensated cirrhosis from Qin et al.’. b, In the new data set 
using MTPS, boxplots showing interquartile range of median abundance of 
statistically significant comparisons between controls (orange), compensated 
cirrhosis (green) and decompensated cirrhosis (blue) using multiple 
corrections-adjusted Kruskal-Wallis tests at the family level. The line in 

the centre shows median. 
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Therefore, the careful separation of the two groups within cirrhosis, 
which have different diagnostic criteria and prognoses, and the control 
of confounders owing to drugs mentioned above, are important for the 
correct interpretation of these results and to avoid epiphenomena. 
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REPLYING TO J. S. Bajaj, N. S. Betrapally & P. M. Gillevet Nature 525, http://dx.doi.org/10.1038/nature14851 (2015) 


In the accompanying Comment’, a concern expressed by Bajaj et al. is 
that diagnostics of liver cirrhosis by microbiome analysis that we 
report” may be mainly due to the microbiome alterations in decom- 
pensated patients (DP). To address it we tested how accurately com- 
pensated patients (CP) can be diagnosed by microbiome analysis. 
Two slightly different criteria of identifying these were used, based 
on absence of ascites and hepatic encephalopathy (n= 54) and 
absence of ascites only (n = 57). 

First, we constructed a discriminator of patients (P, n = 98) and 
healthy controls (H, n = 83) in the discovery cohort, disregarding 
the patient status (CP or DP). For that we used as input the presence 
and abundance of 66 metagenomic species (MGS) differentially 
represented in the two groups’ and as output area under curve 
(AUC) ofa receiver operator characteristic (ROC) analysis, essentially 
as described previously**. The optimal discriminator required 7 MGS 
only and yielded an AUC of 0.95 for the discovery cohort and 
of 0.94 for the validation cohort (P n=25; H n=31), values 
somewhat higher than those observed for the discriminator based 
on 15 biomarkers’. The discriminator stratified the CP (n = 54 or 
n=57) from H (n= 114) as accurately as the DP (n=69 or 
n = 66), with an AUC of 0.95 for all. This shows that the gut micro- 
biome alterations in the two types of patients have highly similar 
features. These features are not greatly affected by medication, 
another concern expressed by Bajaj et al.', as the discriminator strati- 
fied with a comparable efficiency H (n = 114) from P that were taking 
antiviral medication (n = 52) or not (n = 71) with an AUC of 0.95 for 
both; taking B-blockers (n = 11) or not (n = 112), with an AUC of 
0.95 and 0.96, respectively; or taking PPI (n = 70) or not (n = 53), 
with an AUC of 0.96 and 0.93, respectively. We suggest that the 
inability to construct an efficient discriminator of H and CP by 
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Bajaj et al.’ may be due to an inadequate resolution provided by the 
broadly used gene encoding the 16S ribosomal RNA, which remains 
generally at the genus level rather than the species one achieved by 
quantitative metagenomics we deploy” *. 

Notwithstanding the similarity of the gut microbiome alterations 
in CP and DP, there are also differences between the two groups, as 
suggested by the association of the disease severity scores and the load of 
the liver cirrhosis-enriched species’. Bajaj et al.’ rightly point out 
an inaccuracy of the calculation of the model for end-stage liver disease 
(MELD) score in our report’, which refers to previous literature; how- 
ever, the correction had a modest effect, the statistical significance 
between the scores of patients with the lowest and the highest LC quart- 
ile load being P< 2X 10° rather than the reported P< 1X 10°. 

To further explore the microbiome alterations in CP and DP we 
searched for the MGS having a significantly different abundance in 
the two groups, following the approach used for identifying 66 species 
enriched in C or P groups’. Some 30 such MGS were found in the 
discovery cohort (CP n = 45; DP n = 54), but only 13 were not present 
in the set of 66. All 79 species were used to construct the best discrim- 
inator for the discovery cohort. It was based on 14 MGS and stratified 
the CP and DP of the discovery cohort with an AUC of 0.87 and those of 
the validation cohort (CP n = 9; DP n = 16) with an AUC of 0.84. 

This analysis confirms our finding that the alterations of the gut 
microbiome are associated with the severity of the disease. However, it 
provides no evidence for a saltatory alteration to a different composi- 
tion upon decompensation, which could confound microbiome ana- 
lysis, as suggested by Bajaj et al.'; a gradual alteration with the severity 
would lead to the same result. 

Diabetes has been excluded in the patient enrolment in our study’. 
Furthermore, invasion of the gut by oral species was not observed in 
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the previous studies of the type-2 diabetes, notwithstanding the use of 
quantitative metagenomics, which would have easily revealed them 
were they present®®. Alterations of the gut microbiome owing to liver 
cirrhosis are therefore unlikely to be confounded by diabetes and the 
diagnostics of the two pathologies by the gut microbiome analysis 
remains a real possibility. Short-term nutritional changes, such as 
hospital diet, generally have only a modest effect on gut microbiota; 
long term dietary patterns, which affect it more’*, are not very sig- 
nificantly different for the cirrhosis patients and healthy controls in 
the Chinese population from which the participants enrolled in our 
study were drawn’. 

In conclusion, while we adhere to the call of Bajaj et al.’ for 
caution regarding potential confounders in microbiome analysis, we 
strongly disagree with their suggestion that the alterations we report 
are “epiphenomena” rather than actual differences of gut microbial 
communities associated with liver cirrhosis. We suggest that micro- 
biome analysis might supplant current inadequate clinical diagnostic 
parameters and/or invasive procedures such as liver biopsy for detect- 
ing compensated cirrhosis. 
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Forgetfulness illuminated 


Memories are stored in the complex network of neurons in the brain. With the help of innovative tools to manipulate the 
connections between neurons, memories in mice can now be erased with a beam of light. SEE ARTICLE P.333 


JU LU & YI ZU0 


ore than a century ago, the German 
Meee Richard Semon proposed 

that memories leave physical traces 
in the brain, and coined the term ‘engram” 
to describe such traces’. Although the con- 
cept has gained general recognition, the 
search for the engram is ongoing. In this 
regard, the synapse — a specialized connect- 
ing region between neurons — has received 
much attention, but there is still no direct 
evidence of a causal link between synaptic 
changes and memory formation. In this issue, 
Hayashi-Takagi et al.’ (page 333) fill this gap. 
Using ingenious protein engineering and live 
imaging, the authors identify which synapses 
are activated when a mouse learns a motor 
skill, and then weaken these synapses to erase 
motor memory. 

Most synapses in the brain form between 
axons (neuronal ‘output cables’) and dendrites 
(input cables). Signals to excitatory synapses 
are usually received by micrometre-sized 
protrusions called spines that emanate from 
dendrites. The size of the spine head correlates 
with the strength of the synapse’. Spines may 
emerge, disappear or change in size during 
learning and memory formation, reflecting 
changes in the wiring of neuronal circuits’. 

To investigate the causal relationship 
between the formation of motor memo- 
ries and the structural potentiation of spines 
(spine formation or enlargement), Hayashi- 
Takagi et al. developed an ‘optoprobe’ called 
AS-PaRacl that manipulates potentiated spines 
in response to light. The DNA construct for 
AS-PaRacl encodes a light-activatable version 
of the small signalling protein Racl, whose 
prolonged activity induces spines to shrink. 
The construct also incorporates the dendrite- 
targeting sequence of the gene Arc, which is 
expressed rapidly and transiently in response 
to neuronal activity, ensuring that the probe 
moves to dendritic spines that are undergoing 
structural potentiation. The AS-PaRac1 opto- 
probe is the first optogenetic tool to enable the 
manipulation of potentiated spines. 

Hayashi-Takagi and colleagues expressed 
AS-PaRacl in the motor cortex of mice and 
trained the animals to run on an accelerat- 
ing rotating rod known as a rotarod. Light 


Blue light | 
Spine 


enlargement 


Spine 
formation 


Rotarod training 


Figure 1 | Inducing forgetting. A neuron receives excitatory signals from other neurons through 
dendritic spines. When a mouse learns a new task, such as running on an accelerating rotating rod 

(a rotarod), spines involved in learning this task become potentiated (new spines form and existing spines 
increase in size). Hayashi-Takagi et al.’ developed an ‘optogenetic construct’ based on a light-activatable 
form of the small signalling protein Racl, which targets recently potentiated dendritic spines. Blue 

light activates the modified Racl, which induces shrinkage of the spines. The authors found that spine 
shrinkage caused the mouse to forget the skill it had learnt, so it soon fell off the rotating rod. 


activation of AS-PaRac] in potentiated spines 
after learning caused the spines to shrink, 
disrupting the animals’ ability to run on the 
rotarod. This demonstrates the causal rela- 
tionship between synaptic strength and motor 
memory in this context (Fig. 1). 

Next, the authors showed that the effect of 
the probe is task-specific. When mice learnt to 
run on the rotarod and then learnt to walk on 
a thin beam, disrupting the spines that were 
potentiated during beam walking did not affect 
performance on the rotarod. Furthermore, 
AS-PaRacl activation in spines that spontane- 
ously potentiated two days after learning (pre- 
sumably because of unrelated motor tasks) did 
not affect motor performance. Finally, when 
the authors retrained mice on the same task 
for which spine potentiation had been dis- 
rupted, most of the optically shrunken spines 
reverted to their original potentiated sizes. 
Together, these results suggest that distinct 
subsets of synapses are altered in a task-spe- 
cific way during motor learning and memory 
formation. 

In the long quest for the engram, neuro- 
scientists have reached the consensus that the 
mammalian brain stores different memory 
traces in different subsets of neurons in spe- 
cific regions. Methods for labelling, imaging, 
activating and silencing neurons in animals 
have enabled researchers to map the ensemble 
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of neurons that correlates with a particular 
learning task, to manipulate their activities, and 
even to generate artificial memory traces*®. 
However, a single neuron may participate in 
the processing and storage of more than one 
distinct piece of information’. Therefore, the 
engram of a particular memory involves not 
only the identity of the constituent neurons, 
but also the entire set of synaptic connections 
between these neurons. How memory is allo- 
cated at this synaptic level remains unclear. 

To qualify as an engram, a synaptic circuit 
should satisfy several criteria. First, changes in 
synaptic structures and function should corre- 
late with learning. Second, blocking such syn- 
aptic modifications should prevent memory 
formation, demonstrating the need for these 
changes. And third, artificially inducing syn- 
aptic changes should be sufficient to produce 
a memory without the need for behavioural 
training. Over the past decade, in vivo imag- 
ing has revealed® that the dynamic formation 
and elimination of dendritic spines correlates 
with motor-skill learning and memory. Now, 
Hayashi-Takagi and colleagues have taken 
the next step, by establishing necessity — 
they show that undoing the synaptic changes 
that accompany motor learning does indeed 
disrupt the memory. 

The development of genetic and optical 
tools such as AS-PaRacl promises to enable 


dissection of the finer details of the engram. 
The use of promoter sequences that drive the 
expression of target genes ina cell-type-specific 
manner, as well as connectivity-specific labelling 
methods’, can help to unravel the roles in learn- 
ing and memory of synaptic circuits formed by 
different types of neuron — revealing, for exam- 
ple, the relative contributions of excitatory and 
inhibitory neurons, or of neurons in different 
layers of the brain’s cortex. When we have a 
deeper understanding of the molecular signal- 
ling events that occur at synapses during mem- 
ory formation”, tools similar to AS-PaRacl 
can be devised to modulate other components 
of the molecular machinery. Improved micro- 
scopy techniques can already target individual 
neurons or synapses’, rather than manipulating 
a population of neurons as a whole. 


When used together, such technical advances 
will enable us to strengthen existing engrams, 
to facilitate the formation of new ones, and 
to generate synthetic memory traces at the 
synaptic level. We will then be able to study 
the interaction between different memory 
traces, as well as the mechanisms that trans- 
late an engram into behavioural outputs. These 
efforts should allow us to gain an under- 
standing of the intriguing phenomenon 
of memory simply by shining a light on its 
physical basis. m 
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Tens of thousands of 
atoms replaced by one 


Many catalysts comprise metal nanoparticles on solid supports. The discovery 
that single atoms of palladium anchored to a solid support also exhibit high 
catalytic activity might help to conserve the supply of this and related rare metals. 


JOHN MEURIG THOMAS 


he platinum-group metals — ruthenium, 
rhodium, palladium, osmium, iridium 
and platinum — are extensively used 
as catalysts in industries that produce com- 
pounds such as agrochemicals, dyestuffs and 
pharmaceuticals, and several of them are 
crucial components of catalytic converters in 
cars. But as demand for these relatively scarce 
metals increases, their future availability is a 
cause for concern. This would be dispelled 
if the metals could be used in an atomically 
dispersed state, rather than as nanoparticles 
containing up to 100,000 atoms, as is con- 
ventional. Writing in Angewandte Chemie, 
Vilé et al.’ report that individual atoms of 
palladium can be anchored to carbon nitride 
(C,N,), an easily prepared nanoporous solid’. 
The resulting materials are excellent, thermally 
stable catalysts for selective hydrogenation 
reactions, which facilitate the production of 
many organic substances, including polymers 
and biologically important compounds”. 
There are many examples of catalysts in 
which the active components are supported 
nanoparticles of platinum-group metals 
(PGMs) or gold (see refs 4-6, for example). But 
in several cases, it has long been suspected’? 
that the nanoparticles are unimportant, and 
that catalysis occurs at single-atom sites. 
Indeed, isolated metal atoms have previously 


Palladium atom 


Figure 1 | A single-atom palladium catalyst. 

Vilé et al.’ report that isolated palladium atoms 

ona solid support of carbon nitride (C;N,; 

carbon atoms, grey; nitrogen atoms, purple) act 

as catalysts for hydrogenation reactions. Strong 
bonds to the nitrogen atoms firmly anchor the 
palladium atoms in roughly triangular pores in the 
stacked, two-dimensional layers of the support. 
Only one layer is depicted, for simplicity. (Adapted 
from ref. 1.) 
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been manipulated as a strategy for enabling 
selective catalytic hydrogenations”’: atomically 
dispersed palladium atoms on the surfaces of 
a copper crystal stimulate local breaking of 
the bonds in hydrogen molecules, and the 
resulting hydrogen atoms become mobile 
on the copper surface, readily reacting with 
unsaturated molecules such as acetylene and 
styrene. However, in that system, the single 
atoms are laid down on the copper by heat- 
ing a palladium source in a high-vacuum 
chamber using an electron beam. This method 
is suitable for preparing single-atom catalysts 
of other PGMs, but does not readily translate 
to the production of industrial-scale quantities 
of catalysts. 

In their study, Vilé and colleagues propose 
that the catalytically active individual 
palladium atoms are tenaciously attached 
to the nitrogen atoms of the C,N, support 
(Fig. 1), owing to the lone pair of electrons 
that each nitrogen atom has"’. The authors’ 
X-ray-absorption studies found no evidence of 
palladium-—palladium bonds, indicating that 
the atoms are indeed separate from each 
other. The researchers also studied their 
samples using a technique called annular 
dark-field electron microscopy”, which 
takes advantage of the Rutherford scatter- 
ing of electrons’ (scattering at large angles) 
to detect heavy atoms of PGMs on the light 
elements of C,N,. These experiments identi- 
fied only single palladium atoms in the 
active catalyst. 

Vilé and co-workers’ catalysts are particu- 
larly notable because reproducible, thermally 
stable single-atom preparations can be readily 
made, provided that care is taken to incorpo- 
rate only small amounts of the palladium on 
the nanoporous support. Moreover, C,N, is 
inexpensive and may be routinely prepared 
in a graphite-like form”" that has relatively 
widely separated layers, thereby increasing 
the accessibility of the anchored palladium 
atoms to reactants. The authors report that 
it also has the merit of a high surface area 
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(about 150 square metres per gram), which 
maximizes catalytic performance. 

The main hydrogenation reaction studied 
by the authors was the conversion of 1-hexyne 
to 1-hexene, in which carbon-carbon triple 
bonds are selectively converted to double 
bonds, but not further to single ones. Such 
selective hydrogenations have convention- 
ally used a Lindlar catalyst’, which con- 
sists of nanoparticles of a palladium-lead 
compound”* on a calcium carbonate support. 
The authors’ single-atom palladium catalyst 
enables much higher yields and faster reactions 
than either a Lindlar catalyst or hydrogenation 
catalysts based on nanoparticles of platinum 
or gold”. It also yields 1-hexene with greater 
than 99% selectivity. Moreover, after repeated 
use (five successive tests), the catalyst displays 
no decrease in selectivity nor in the fraction 
of 1-hexyne converted to 1-hexene. Finally, 
Vilé and colleagues report that their single- 
atom catalyst enables the hydrogenation of 
nitrobenzene to form aniline exclusively. This 
kind of reaction is used to make compounds 
for the dyestuffs industry and key intermedi- 
ates in the manufacture of agrochemicals and 
pharmaceuticals. 

Single-atom solid catalysts are of consider- 
able interest, from both a practical”’*”” and 
a theoretical” perspective, not only because 
selective hydrogenations are among the most 
valuable conversions in industrial chemistry, 
but also because these reactions are atom effi- 
cient’: the minimum number of atoms is used 
in each reaction, reducing waste. Such catalysts 
can also be used for other reactions. For exam- 
ple, recent work'® shows that single atoms of 
platinum function as atom-efficient catalysts 
for the water-gas shift reaction, which is used 
to generate pure hydrogen for the synthesis of 
ammonia. Single-atom platinum catalysts have 
also been used for the selective hydrogenation 
of nitroaromatic compounds’. 

Readily prepared single-atom platinum 
catalysts’ supported on iron oxide (FeO,) have 
been reported to be much more active, selec- 
tive and durable than analogous nanoparticle 
platinum catalysts. Remarkably, single atoms 
of platinum supported on FeO, are chemo- 
selective in the hydrogenation reactions that 
they catalyse — that is, they can discriminate 
between two or more regions of a molecule 
that could potentially react with hydrogen. 
For example, the catalysts convert nitro groups 
(NO,) to amino groups (NH,), but leave 
carbonyl groups (C=O) and benzene rings 
untouched. In one such reaction, a single- 
atom platinum catalyst displayed a turnover 
frequency (the number of reactant molecules 
converted to product per unit of time) of 
1,500 per hour, which is 20 times as high as the 
previous best result reported in the literature. 
The selectivity for the substrate of that reaction 
was about 99%, the highest reported for any 
PGM catalyst. 

The future looks bright for the use of PGMs 


as catalysts, both on laboratory and industrial 
scales, because the preparation of most kinds 
of single-atom metal catalyst is likely to be 
straightforward, and because characteriza- 
tion of such catalysts has become easier with 
the advent of techniques that readily discrimi- 
nate single atoms from small clusters and 
nanoparticles. A prerequisite is to find ways 
of securely anchoring single atoms of these 
expensive metals to high-surface-area, cheap 
and plentiful solids composed of elements 
that are abundantly available, as Vilé et al. and 
others'® have done. If this can be achieved gen- 
erally, then the future deployment of PGMs in 
solid catalysts will be transformed. m 
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Perplexing effects of 
phenotypic plasticity 


Research on guppies provides evidence that phenotypic plasticity — an organism’s 
ability to alter its characteristics in response to changes in the environment — can 
both constrain and facilitate adaptive evolution. SEE LETTER P.372 


JUHA MERILA 


Itered or new environmental 

conditions, such as those brought 

about by climate change, are impor- 
tant sources of selection pressures that drive 
organismal adaptation and evolution. But 
alongside genetic adaptation, organisms 
can respond to environmental challenges 
through adaptive phenotypic plasticity, which 
refers to a non-genetic shift in the average 
characteristics (phenotype) of a population 
towards an evolutionary optimum. Whether 
phenotypic plasticity generally facilitates or 
constrains adaptive (genetic) evolution remains 
a contentious issue’ *. On page 372 of this 
issue, Ghalambor et al.” provide experimental 
evidence from guppies suggesting that adap- 
tive phenotypic plasticity in gene-expression 
patterns constrains evolution. But they also 
find that non-adaptive plasticity — pheno- 
typic changes that do not directly contribute 
to increased fitness under the changed con- 
ditions — may facilitate adaptive genetic 
change by increasing the strength of natural 
selection. 
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The authors’ experiments involved trans- 
planting wild Trinidadian guppies (Poecilia 
reticulata; Fig. 1) from a stream that also 
hosted predatory cichlid fish into two replicate 
streams without cichlids. They then compared 
patterns of brain gene expression between the 
introduced and original (ancestral) popula- 
tions after three or four generations. Parallel 
changes in gene expression had occurred for 
135 genes in the two introduced populations, 
and these new levels of gene expression were 
similar to those exhibited by a native cichlid- 
free population. This suggested rapid adaptive 
evolution in the introduced populations. 

However, the evolved differences were 
mostly (89% of the genes) in the opposite 
direction to that of phenotypic plasticity in 
expression patterns in the ancestral popula- 
tion. This was inferred by comparing the gene 
expression in ancestral fish reared in either 
the presence or absence of chemical cues 
from predatory cichlids. Thus, the pheno- 
typic plasticity in these genes can be consi- 
dered non-adaptive. The remaining 11% of 
genes exhibited adaptive plasticity — the 
evolved differences in gene expression in the 
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Figure 1 | Trinidadian guppies. Ghalambor et al.° show that high levels of non-adaptive phenotypic 
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plasticity in a source population seem to facilitate rapid adaptive evolution of gene-expression patterns 


in guppies transplanted into a different environment. 


experimentally introduced populations were 
concordant with the direction of change of 
expression levels in ancestral fish raised in 
the absence of predatory-fish cues. The 
authors also observed that there was little or 
no population divergence in the expression 
of these genes in either of the introduced 
populations. 

The latter findings support evolutionary 
models predicting that adaptive phenotypic 
plasticity should weaken the strength of direc- 
tional selection and thereby slow the rate of evo- 
lution (see refs 6 and 7 for examples). However, 
the real stunner of the study was the discovery 
that most of the evolved (genetic) differences 
in gene-expression patterns in the introduced 
guppy populations had taken place in the oppo- 
site direction to the direction of plasticity in the 
ancestral population. This inverse relationship 
between the direction of plasticity and the direc- 
tion of adaptive evolution suggests that non- 
adaptive plasticity may facilitate (in the authors’ 
words, potentiate) evolution by increasing the 
strength of directional selection required to 
create the observed divergence in gene- 
expression patterns. 

The authors obtained support for the 
hypothesized increase in directional selection 
against non-adaptive plasticity by examin- 
ing evolutionary changes in the magnitude of 
plasticity (quantified as the mean difference 
in expression levels of gene transcripts in the 
predator-cue-treated groups) between ances- 
tral and introduced populations. Although 
phenotypic plasticity is, by definition, a 


non-genetic response to environmental cues, 
the capacity to express it, and its magnitude, 
can be genetically variable’*. Consequently, if 
directional selection had acted most strongly 
on gene transcripts exhibiting non-adaptive 
plasticity, then the magnitude of plasticity in 
introduced populations in response to this 
selection should be reduced. This was just 
what Ghalambor et al. observed. Moreover, 
the decline in the magnitude of plasticity in 
the introduced populations was inversely pro- 
portional to plasticity in the ancestral popula- 
tion. This also aligns with the expectation that 
transcripts exhibiting the greatest non-adap- 
tive plasticity should be the ones that are most 
strongly selected against. 

Although the findings that phenotypic 
plasticity can both constrain and facilitate 
evolutionary (genetic) adaptation are not 
unprecedented, several features of Ghalambor 
and colleagues’ study set it apart from earlier 
work on this topic. For instance, instead of 
focusing on a limited number of traits, the 
authors assessed the plasticity of a large num- 
ber of traits (expressed genes), which allowed 
them to draw robust quantitative conclusions. 
Nevertheless, a question to be addressed is 
whether results from gene-expression analy- 
ses can be extended and generalized to macro- 
scopic traits that have more-direct ecological 
relevance. Similarly, most previous empirical 
studies that focused on the direction of plas- 
tic responses and the direction of subsequent 
evolutionary divergence in wild populations 
have been limited to comparisons between 
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ancestral and derived populations long after 
they diverged. The new study’s focus on initial 
patterns of plasticity and subsequent rapid 
adaptive divergence in the wild provides a 
thought-provoking complement to laboratory 
experiments that have provided evidence sup- 
porting both positive (adaptive)*” and nega- 
tive (non-adaptive)"” relationships between the 
directions of plastic responses and evolution. 

Ghalambor and colleagues’ results are also 
intriguing because most (but not all) attempts 
to model the effects of plasticity on subse- 
quent evolution have assumed it to be adap- 
tive. Thus, the observed negative relationship 
between the direction of plasticity and the 
direction of evolution in guppies may guide 
future theoretical work in the field. Further- 
more, although increased strength of selec- 
tion caused by non-adaptive plasticity may 
contribute to rapid adaptation and increase 
the likelihood of population persistence, it 
may also lead to reduced population size and 
an increased risk of demographic collapse’. 
By reducing population size, selection stem- 
ming from non-adaptive plasticity may expose 
a population to an increased rate of random 
genetic changes owing to a process known 
as genetic drift. This would in turn propagate 
loss of genetic variation and reduced efficiency 
of selection, counteracting the proposed 
benefit from non-adaptive plasticity. 

As fascinating as it is to suggest that 
maladaptive plasticity may be a strong driver 
of evolution, sceptics may require further 
experimental studies from the wild with more 
population replicates and with a focus on traits 
with established ecological relevance (such as 
behaviours and morphology) to be convinced. 
Such studies would also be helpful, if not essen- 
tial, in developing parameters for models that 
aim to understand how the interplay between 
phenotypic plasticity, natural selection and 
random genetic drift influences evolutionary 
changes. = 
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Repositioned to kill 


stem cells 


Chemotherapy- resistant cancer stem cells make it hard to cure many forms 
of the disease. Repositioning an existing drug to tackle this problem could 
significantly improve treatment for one form of leukaemia. SEE LETTER P.380 


TESSA HOLYOAKE & DAVID VETRIE 


(CML), a daily oral medication can rapidly 
transform a progressive and ultimately fatal 
cancer into a chronic but manageable condi- 
tion. But this is not a cure. The persistence of 
quiescent (dormant, non-cycling) and thus 
drug-resistant leukaemic stem cells (LSCs) 
poses an unmet clinical challenge, and any 
attempt to cure CML must target the eradi- 
cation of these cells. In this issue, Prost et al.’ 
(page 380) present provocative preclinical 
and early clinical findings demonstrating that 
a drug currently used for diabetes therapy can 
be repositioned to target a pathway that con- 
trols quiescence in LSCs, causing the gradual 
erosion of this cellular pool. 
The cause of CML is a mutation in a 
normal blood stem cell involving an exchange 


I n most cases of chronic myeloid leukaemia 
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of genetic material between chromosomes 9 
and 22. This translocation creates a cancer- 
driving gene known as BCR-ABL1, which 
produces a protein with enhanced activ- 
ity as a tyrosine kinase enzyme, leading to 
uncontrolled cell proliferation. BCR-ABL1 has 
been shown to be sufficient to drive the devel- 
opment of leukaemia in mouse models’, and 
the discovery of this protein led to the develop- 
ment of tyrosine kinase inhibitors (TKIs) for 
CML treatment. 

In the past two decades, TKIs have dramati- 
cally improved the outcome for people with 
this cancer. Most of those who present with 
early disease respond rapidly to TKI therapy 
and go into long-lasting remission. How- 
ever, TKIs fail to eradicate LSCs, the cells that 
initiate and maintain CML, and these drug- 
resistant cells can drive relapse, or evolve to 
cause further forms of TKI resistance and 
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Figure 1 | Targeting leukaemic stem cells in chronic myeloid leukaemia. Prost et al.' describe a 
molecular pathway, involving the receptor PPARy, the transcription factors STAT5 and HIF2a, and the 
regulatory protein CITED2, that induces leukaemic stem cells (LSCs) to enter a dormant (quiescent) 
state. They also show that the drug pioglitazone, approved for diabetes treatment, activates PPARy to 
block this pathway, and can kill these cells when used in conjunction with tyrosine kinase inhibitors 
(TKIs), which inhibit the protein BCR-ABL1 and thus STATS, and which are the standard therapy 
against active (cycling) leukaemic cells. Several drugs used to treat other diseases, such as axitinib, arsenic 
trioxide and hydroxychloroquine, have also been repositioned to treat chronic myeloid leukaemia, but 


these have different mechanisms of action. 
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more-aggressive disease. Asa result, people on 
life-long TKI therapy are exposed to asso- 
ciated, often serious, side effects and may 
cease to respond to the treatment at any time. 
Furthermore, the significantly improved 
survival for those taking TKIs means that 
the prevalence of CML is increasing each 
year, with inherent social and economic 
implications. 

Several potential mechanisms to explain 
the insensitivity of LSCs to TKIs have been 
proposed, including cellular quiescence. 
Prost et al. report that quiescence in LSCs is 
regulated by a pathway involving the recep- 
tor PPARy, the transcription factors STAT5 
and HIF2a, and the protein CITED2, which 
is known’ to regulate blood stem-cell quies- 
cence (Fig. 1). A particular strength of the 
study was the use of primary blood stem cells 
(expressing the marker CD34) from people 
with CML to dissect the pathway and con- 
firm the role of each component in regulating 
LSC quiescence. 

The authors go on to show that combining 
imatinib, the standard TKI used to manage 
CML, with the antidiabetic agent pioglitazone, 
which activates PPARy, blocks this pathway in 
CML cells. The synergistic effects of the drugs 
reduce STATS expression and activity, down- 
regulate HIF2a and CITED2 expression, and 
trigger the death of quiescent LSCs. Although 
the mechanism by which LSCs are killed 
in response to this drug combination is not 
clear, they are probably either killed directly 
or driven to exit quiescence, which may lead 
to their eradication by the TKI. The authors 
also demonstrate that the compound JQ1, a 
bromodomain inhibitor with broad activity 
that includes the suppression of STATS activ- 
ity, is as effective as pioglitazone (in combi- 
nation with imatinib). Although this finding 
supports a role for the STAT5 pathway in LSC 
quiescence, the door is still open for studies of 
other agents that may target LSCs through this 
or alternative pathways. 

Collectively, these results strengthen 
the concept that cancer stem cells exhibit 
vulnerabilities in otherwise normal molecular 
pathways that may be targeted in a selective 
manner to obtain a cure. Earlier work dem- 
onstrated that CML stem-cell quiescence is 
in part maintained by the promyelocytic 
leukaemia tumour-suppressor protein, which 
can be targeted by arsenic trioxide’, and that 
the cellular process of autophagy functions 
as a survival pathway for CML stem cells that 
can be targeted by repositioning the anti- 
malarial agent hydroxychloroquine’ (Fig. 1). 
Both of these approaches are currently 
under investigation in the clinic. 

Prost et al. also tested the addition of 
pioglitazone to imatinib therapy in three peo- 
ple with CML, and found that they converted 
from having demonstrable residual leukae- 
mia to being disease-free. The effect lasted for 
months to years after pioglitazone treatment 


ceased. These data provided a strong rationale 
for a phase II clinical trial, which started in July 
2009 (ACTIM EudraCT 2009-011675-79). 
Although the interim results from this trial are 
encouraging, the study is non-randomized, 
so it will be difficult to ascertain definitively 
that improved response rates are driven by 
pioglitazone. 

Despite the need for further clinical testing 
of this combination therapy, Prost et al. have 
demonstrated the substantial potential for 
drug repositioning in CML research. Their 
results follow a recent report® in which axitinib, 
a TKI approved for the treatment of drug- 
resistant renal-cell cancer, was repositioned to 
tackle TKI resistance in CML. Using drugs that 
have already been approved for other purposes 
can shorten the drug-development pathway by 
5-10 years and reduce risks and costs. 


CONDENSED-MATTER PHYSICS 


Although drug repositioning can be rather 
serendipitous, Prost and colleagues had a 
tangible rationale that PPARy activators such 
as pioglitazone warranted investigation in 
CML on the basis of their observation’ of the 
drugs’ activity against a cell-line model of the 
disease. Already around 30% of drugs newly 
approved for a particular treatment have been 
repositioned from another therapy, and such 
hypothesis-driven repositioning strategies are 
likely to become more common in cancer drug 
discovery. This figure is set to rise further as 
our understanding of cellular pathways and 
processes increases and we include innova- 
tive computational approaches to facilitate 
disease-, drug- and treatment-oriented drug 
repositioning. It is clear that reposition- 
ing will increasingly help the fast-tracking 
of drugs into the clinic. As demonstrated by 


Charge topology in 
superconductors 


X-ray images of cuprate superconductors reveal the fractured, defect-riddled 
backbone on which superconductivity develops. The results take us a step closer 
to understanding how supercurrent flows on small spatial scales. SEE LETTER P.359 


ERICA W. CARLSON 


he quantum motion of electrons 

enforces a high degree of homogeneity 

in conventional materials such as metals 
and semiconductors. As a result, the electrons 
spread out evenly in these materials, like liq- 
uid filling a container. By contrast, nanoscale 
images of copper oxide (cuprate) superconduc- 
tors have revealed that the materials’ electrons 
form clumps at the surface’. On page 359 of 
this issue, Campi et al.” report X-ray images of 
a cuprate superconductor, revealing complex 
patterns of electrons’ that are also scaffolded 
throughout the interior of the material, and 
ona much larger scale than has been observed 
before. Just like the skeleton of a coral reef, 
where, the greater the scale on which the reef is 
observed, the more complexity meets the eye, 
the electrons in these materials form structures 
full of gnarled hollows of varying size. 

Campi and colleagues find that the size of 
the patterns formed by electrons is strongly 
tied to the degree to which the cuprate super- 
conductor is doped — meaning that a small 
amount of one type of atom is substituted 
for another, to change the charge available 
for conducting current through the mater- 
ial. Materials that have a uniform distribu- 
tion of electrons, such as semiconductors, are 
typically robust against spatial variations in 


doping level. Nanostructures are a notable 
exception: variations in doping level affect the 
performance of the smallest semiconductor 
devices. The electrons inside cuprate com- 
pounds that superconduct at high tempera- 
tures (up to 160 kelvin) spontaneously form 
nanostructures, and so these materials are 
sensitive to local doping variations. 
The authors made their discovery using 
a technique called scanning micro X-ray 
diffraction. In this 
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material. It was 
already known that the charge in cuprate 
superconductors is often locally ordered 
into a unidirectional pattern, with a periodi- 
city of about four crystalline unit cells (the 
smallest periodically repeating structures in 
a crystal), so that the electron-density dis- 
tribution resembles striped wallpaper”. 
Campiet al. scan a micrometre-sized X-ray 
beam across a superconducting sample to 
probe how the character of this stripy charge- 
density wave varies from spot to spot in the 
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Prost and colleagues, this could soon signal the 
beginning of the end for stem-cell quiescence 
in CML and other cancers. = 
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material. But instead of a single sheet of ‘striped 
wallpaper’, they find rips, tears and patches 
in the electronic texture, as though someone 
had papered a wall with sheets of many dif- 
ferent sizes and shapes, and with complete 
disregard for whether the borders matched up. 

As in nanostructure semiconductors, these 
electronic textures in cuprate superconduc- 
tors are sensitive to local variations in the 
doping level on nanometre and micrometre 
scales. For example, the lower the level of local 
oxygen doping, the higher is the contrast of 
the charge-density waves (bright wallpaper 
stripes), introducing greater disorder into the 
overall charge pattern. 

These effects are important for super- 
conductivity because of two factors: dimen- 
sionality and connectivity. The behaviour of 
electrons ultimately depends on the shape of 
their quantum-mechanical waves, and waves 
show vastly different behaviour in different 
dimensions. In three dimensions, the energy 
carried by a wave over a distance r spreads 
out as r-*, as in sound waves coming from a 
speaker. In two dimensions, such as in ripples 
emanating from a pebble thrown into a pond, 
the distance dependence changes to r"'. In one 
dimension, waves cannot dissipate by spread- 
ing out. Like the bow waves of canal tugboats, 
there is only one way for a wave to go in one 
dimension: forward. The charge-density 
structures that Campi et al. find are enticing, 
because they effectively reduce the dimension- 
ality that electrons can explore, which can lead 
to mechanisms of superconductivity that are 
fundamentally different from those of conven- 
tional superconductors®”. 

However, the dimensionality that Campi 
et al. infer is not integer. They find that the 
sizes of the patches formed by the charge- 
density waves are distributed according to a 
power law, which is typical of fractal dimen- 
sions. Like any good fractal, these patterns 
display similarities whether they are observed 
from close up or far away. Although much 
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theoretical effort has gone into understanding 
electrons in three, two and one dimensions, we 
knowlittle about the behaviour of electrons in 
fractal dimensions. 

The other key ingredient of the effects that 
the authors observe is connectivity. Macro- 
scopic superconductivity is ultimately a 
charge-transport phenomenon (for electric- 
ity to flow, electrons must be transported from 
one side of the sample to the other), and this 
transport is dominated by connectivity. With- 
out connections between different domains, 
supercurrent cannot flow through the sample, 
and the material fails to be a practical, bulk 
superconductor. 

A disordered charge distribution can there- 
fore be devastating to connectivity in super- 
conducting materials. One crucial connection 
erased by disorder can unlink an entire sys- 
tem. Disorder can also affect the nature of the 
changes in physical properties that accom- 
pany the onset of phase transitions (including 
superconductivity), by smearing out an abrupt 


ATMOSPHERIC SCIENCE 


transition, lowering the temperature at which 
it happens or changing the geometry of the 
fractal charge distribution associated with a 
smooth phase transition. Such disorder can 
make it much harder for a system to equili- 
brate, causing the changes in its properties to 
lag in response to external inputs (hysteresis) 
and to be dependent on past inputs (memory 
effects). However, these effects can also be 
turned into an opportunity to control domain 
morphology through system-training proto- 
cols®, much like the way in which commercial 
permanent magnets are prepared in a mag- 
netized state. This means that the disordered 
states that the authors have discovered might 
be exploited to manipulate superconductivity 
along similar lines. 

One limitation of Campi and colleagues’ 
study is that they did not directly observe the 
morphology of the path that the supercon- 
ducting electrons take. Rather, they inferred it 
from the morphology of the observed variable 
charge distribution. More data are needed to 


The death toll from 
air-pollution sources 


Estimates of worldwide deaths associated with exposure to fine particles in 
atmospheric pollution provide some surprising results. The findings will guide 
future research and act as a wake-up call for policymakers. SEE LETTER P.367 


MICHAEL JERRETT 


estimate the number of worldwide deaths 

each year caused by seven sources of air 
pollution. To do this, they used advanced 
global atmospheric-chemistry models, 
detailed country-level population and health 
data, and integrated exposure-response (IER) 
functions — statistical models that describe 
how mortality varies with exposure to fine 
particulate air pollution. The atmospheric- 
chemistry model allowed the researchers to 
attribute air pollution and premature deaths in 
different regions to emissions associated with 
various sectors of the economy. 

More than 3.2 million deaths per year 
have been attributed’ to exposure to outdoor 
particulate matter known as PM, ; — particles 
less than 2.5 micrometres in diameter, which 
can penetrate deep into the lungs and cause 
a wide range of health problems. Many parts 
of the United States and Europe have seen 
substantial improvements in air quality over 
recent decades as a result of regulatory inter- 
ventions, and growing evidence™ suggests that 
these improvements benefit public health. But 


IE this issue, Lelieveld et al. (page 367) 


other regions, particularly countries in Asia 
with vast populations, continue to have poor 
air quality” (Fig. 1), with the emissions of sev- 
eral key pollutants expected to increase in the 
future®. The overlap of high pollution and large 
populations takes a huge toll on public health, 
but little is known about the pollution sources 
that are responsible for premature deaths. 
Enter Lelieveld and colleagues. The authors’ 
results are surprising and potentially impor- 
tant for protecting public health globally. 
First, they estimate that ambient PM, , from 
commercial and residential energy sources 
contributes the most to premature deaths 
worldwide. These sources include solid fuel 
such as coal and biomass used for heating 
and cooking, local waste disposal and diesel 
generators. Such sources account for 32% of 
the premature deaths in China and 50-70% 
of those in India and other Asian nations. 
The IER functions’ that the authors used 
pool epidemiological exposure-response 
information for mortality associated with 
exposure to outdoor particles, emissions from 
biomass burning, and tobacco smoke (both 
from active smoking and second-hand expo- 
sure). For deaths attributable to stroke and 
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probe that intermediate length between the 
nanometre and macroscopic length scales, so 
as to chart the true path of the superconducting 
electrons. Future work should also investigate 
how the spatial pathways of superconductivity 
are affected by the complex interplay between 
disordered electron distributions and charge- 
density waves. m 
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cardiovascular disease, the IER curve is steeper 
at low exposures (implying that the mortal- 
ity effects of increases in PM, ; are greater at 
lower particulate levels), but generally flattens 
at higher exposures. Large uncertainties in the 
IER for PM, ; occur in the exposure range of 
approximately 30-100 micrograms per cubic 
metre (ref. 7), because no information for 
cardiovascular mortality due to outdoor PM, ; 
is available, and because only a few studies of 
second-hand smoke exposure exist. A caveat to 
Lelieveld and colleagues’ estimates of prema- 
ture deaths from commercial and residential 
energy sources in Asian countries is that they 
fall mostly in these areas of high uncertainty. 

Studies of the effects of biomass burning on 
cardiovascular disease or stroke at any level of 
exposure are also lacking’. Furthermore, the 
largest study so far to examine how sources of 
fine-particle air pollution affect heart-disease 
mortality’ found no effects for ambient PM, ; 
from biomass burning in the United States. 
Nevertheless, as the authors point out, even 
if it is assumed that biomass burning and 
commercial and residential energy use do not 
contribute to mortality associated with heart 
disease, such energy use remains the largest 
factor for global mortality associated with 
air pollution overall, even though the total 
number of deaths declines. 

Lelieveld and colleagues’ next major finding 
is that agricultural sources are the second- 
largest contributor to global mortality from 
PM, , — releases of ammonia from livestock 
and fertilizers lead to atmospheric formation 
of ammonium nitrate and sulfate particles. 
Agricultural sources are the leading source of 
mortality in the eastern United States, Russia, 
Turkey, Korea, Japan and Europe, contribut- 
ing to more than 40% of the deaths in many 
European countries. 
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Figure 1 | Burning waste in India. Lelieveld et al.’ estimate that fine particles generated from commercial and residential energy use, including waste burning, 
contribute the most to pollution-associated premature deaths globally, especially in India and other Asian countries. 


This finding assumes that ammonium 
nitrate and sulfate have the same toxicity as 
other constituents of the atmospheric parti- 
cle mixture. Some epidemiological studies” 
do indeed report adverse effects from these 
particles, but many toxicological data indi- 
cate that they have little biological potency at 
ambient levels’. The contradictory evidence 
for ammonium sulfate probably arises because 
these particles are often mixed with metals and 
other toxic components from coal or industrial 
sources’’. It could therefore be that Lelieveld 
et al. overestimate the effects of particles from 
agricultural sources. The finding is highly 
valuable, however, because agriculture has 
generally not been seen as a major source of 
air pollution or premature death, and because 
it suggests that much more attention needs to 
be paid to agricultural sources, by both scien- 
tists and policymakers. 

Third, the researchers find that traffic- 
related pollution accounts for about 20% of 
the deaths from PM, ; in the United States, the 
United Kingdom and Germany, but only 5% 
globally. The spatial resolution of their global 
assessment (which considers sub-areas of 
approximately 110 x 110km) cannot capture 
the effects of finer-scale variation of traffic 
pollution. Other studies'"” have found that var- 
iation in pollution about 50-500 metres from 
the roadside correlates with mortality. Mount- 
ing evidence’ also points to heightened effects 
on health and mortality from the components 
and reaction products of traffic emissions com- 
pared with other emission sources. Thus, the 
effects from traffic might be underestimated 


by Lelieveld and colleagues. But the findings 
send out two crucial messages: traffic emissions 
remain a major source of premature death in 
Western countries even after extensive regu- 
latory action, and the rapid rate of growth in 
traffic in many regions may well lead to 
increased pollution and more premature deaths 
in the near future. 

Finally, the authors project a doubling of 
mortality from air pollution by 2050 on the 
basis of projected rates of increase in pollution 
and population levels. This projection should 
sound alarm bells for public-health agencies 
around the world. It also raises the question 
of which sources should be reduced in dif- 
ferent regions. The answer depends on how 
much trust we put in the IER curve. Because 
the steep part of the curve is at lower levels of 
ambient PM, ., large benefits can accrue from 
relatively small reductions in air pollution in 
cleaner regions, whereas the flatness of the 
curve at high levels necessitates large reduc- 
tions in the polluted areas of Asia to achieve 
major health benefits”. 

Lelieveld and colleagues’ findings suggest 
that about 1 million lives could be saved every 
year by reducing ambient exposure to pollu- 
tion. A further 3.54 million lives per year could 
be saved by lowering indoor exposures from 
similar sources’, mainly through changes in 
commercial and residential energy use. Incen- 
tivizing the use of cleaner fuels or of electricity 
for local energy needs would reduce mortality 
from both indoor and ambient PM, ; expo- 
sure and should be a priority in Asia and other 
regions that rely on solid fuels. For many parts 


of the world, more research is needed if we 
are to understand the impacts of agricultural 
practices on air pollution and mortality, and 
especially to determine the toxicity of ammo- 
nium nitrate and sulfate emanating from this 
source. And in countries that already have low 
ambient levels of pollution, sizeable benefits 
can still be achieved by reducing emissions 
from fossil-fuel power plants and traffic. m 
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Labelling and optical erasure of synaptic 
memory traces in the motor cortex 


Akiko Hayashi-Takagi’, Sho Yagishita’’, 
Brian Kuhlman*°, Klaus M. Hahn®’” & Haruo Kasai’? 


Mayumi Nakamura’, Fukutoshi Shirai’, Yi I. Wu*, Amanda L. Loshbaugh*”®, 


Dendritic spines are the major loci of synaptic plasticity and are considered as possible structural correlates of memory. 
Nonetheless, systematic manipulation of specific subsets of spines in the cortex has been unattainable, and thus, the link 
between spines and memory has been correlational. We developed a novel synaptic optoprobe, AS-PaRacl (activated 
synapse targeting photoactivatable Racl), that can label recently potentiated spines specifically, and induce the selective 
shrinkage of AS-PaRacl-containing spines. In vivo imaging of AS-PaRacl revealed that a motor learning task induced 
substantial synaptic remodelling in a small subset of neurons. The acquired motor learning was disrupted by the optical 
shrinkage of the potentiated spines, whereas it was not affected by the identical manipulation of spines evoked by a 
distinct motor task in the same cortical region. Taken together, our results demonstrate that a newly acquired motor skill 
depends on the formation of a task-specific dense synaptic ensemble. 


Optogenetics is a powerful tool for controlling neuronal action poten- 
tials'’, and has been used to demonstrate the crucial role of cell 
assemblies in representing memory traces’. However, owing to the 
limitations of spatial resolution of probes currently available, manip- 
ulation of individual dendritic spines, the major sites of excitatory 
synapses* °, has been unfeasible, hindering the comprehensive under- 
standing of synaptic reorganization during learning. Thus, for spine- 
specific light control, we took advantage of the structural properties of 
spines: the tight correlation between spine volume and function*’ 
Because the prolonged activation of the small GTPase Racl induces 
spine shrinkage*"', we used a photoactivatable form of Racl 
(PaRacl)’” to induce spine shrinkage, which allowed us to control 
synaptic transmission with light. Moreover, since it has been sug- 
gested for a long time that the memory trace is allocated to specific 
neurons and spines of neurocircuits’*"*, here we targeted PaRacl to 
the activated synapses (activated synapse targeting PaRacl, AS- 
PaRacl) to establish a novel method, termed “synaptic optogenetics’, 
to visualize and manipulate the memory trace. 


AS-PaRacl labels the potentiated spines 


We first re-engineered the original PaRacl construct’* to optimize its 
properties for synaptic manipulation. Introduction of L514K and 
L531E mutations into the original construct markedly reduced the 
undesirable Racl background activity in the dark, as shown by 
isothermal titration calorimetry (ITC), the neuronal morphology, 
and co-immunoprecipitation (Extended Data Fig. la-c). Next, 
PaRacl was fused with a deletion mutant of PSD-95 (PSDA1.2)", 
which is known to concentrate at the postsynaptic site, but cannot 
bind with the major PDZ binding proteins, thus minimizing the 
undesirable effects of PSD-95 overexpression. An enrichment index, 
quantitative ratio of synaptic localization compared to that of the 
dendritic shaft (see Methods), supported the effective accumulation 
of PSD-PaRacl to the synapse, especially at the tip of the spine 
(Fig. 1a, construct B), where it was highly co-localized with the endo- 
genous PSD-95, but not with an axonal marker (Extended Data Fig. 
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Figure 1 | Potentiation-dependent accumulation of AS-PaRacl to the 
dendritic spines in hippocampal slice cultures. a, Mapping for essential 
domains for the discrete distribution of the probe (arrowheads). Enrichment and 
hot spot index are plotted as arbitrary units. b, Representative images of single 
spine potentiations by glutamate uncaging (arrowheads) in the presence or 
absence of forskolin (FSK) and anisomycin (Ani). Mg, no Mg”*; 1Mg, 

1mM MgCl, ¢, d, Time courses of spine head volume (V, c) and AS-PaRacl 
accumulation (AS, d), both measured by fluorescence intensity. The mean change 
60 min after uncaging in the stimulated or neighbouring spines. e, The effect of 
lactacystin on the discrete accumulation of AS-PaRacl (arrowheads). DIV, days 
in vitro. Scale bars, 2 Lum. Error bars represent s.e.m. Detailed information on 
statistical methods/results are described in Extended Data Table 1. 
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1d). Finally, for neuronal input specificity, we exploited the dendritic 
targeting element (DTE) of Arc mRNA", which is selectively targeted 
and translated in activated dendritic segments in response to synaptic 
activation in an NMDA (N-methyl-p-aspartate) receptor-dependent 
manner’”"’. Interestingly, PSD-PaRacl-DTE sparsely labelled 
spines (Fig. la, construct C, arrowheads). Quantification using a 
hot spot index (see Methods), which indicates how unevenly 
PaRacl variants were distributed, suggested that both PSDA1.2 and 
DTE was necessary for this characteristic distribution (Fig. 1a, con- 
structs C and E). Therefore, the combination of PSDA1.2 and DTE 
was termed as “AS (activated synapse targeting) cassette’, and the 
PaRacl sequence flanked with the AS cassette was named AS- 
PaRacl (Fig. 1a, construct C). 

Next, we tried to unravel what this new synaptic probe labelled. 
Bicuculline, which increases neuronal excitation, robustly enhanced 
the number of AS-PaRacl-containing spines, and reduction of the hot 
spot index revealed that the distribution of AS-PaRacl became rela- 
tively uniform upon bicuculline treatment. In contrast, the blockage 
of action potential by tetrodotoxin (TTX) decreased the accumulation 
of the probe, resulting in a reduction in the spine enrichment index of 
the probe (Extended Data Fig. 2a—d). Because these findings suggested 
that synaptic activation regulates the localization of AS-PaRacl, we 
hypothesized that AS-Racl accumulates in recently potentiated 
spines. Indeed, when AS-PaRacl was co-transfected with SEP- 
GluA1, the synaptic incorporation marker for AMPA (a-amino-3- 
hydroxy-5-methyl-4-isoxazolepropionic acid) receptor subunits 
GluRI (refs 20, 21), the fluorescence signals of these two probes inside 
each spine were significantly correlated (Extended Data Fig. 2e, 
arrowheads). Furthermore, the protein synthesis-dependent poten- 
tiation during the single spine LTP protocol, which was elicited by 
glutamate uncaging and the adenylyl cyclase activator forskolin 
(FSK)”4, induced the accumulation of AS-PaRacl in the stimulated 
spines, while the protein synthesis-independent plasticity (glutamate 
uncaging alone) did not. Consistently, protein synthesis inhibitor 
anisomycin abolished the FSK-induced AS-PaRacl accumulation 
(Fig. 1b-d). No increase was observed in AS-PaRacl fluorescence in 
the neighbouring spines, indicating that AS-PaRacl accumulation 
was restricted to the stimulated spine (Fig. 1d). The DTE sequence 
was necessary for activity-dependent AS-PaRacl accumulation 
(Extended Data Fig. 2f, g), supporting that locally translated 
AS-PaRacl, unlike somatically translated AS-PaRacl, was preferen- 
tially recruited to enlarged spines. PaRacl did not exhibit uneven 
distribution unless the construct contained the PSD-95 domain 
(Fig. 1a, construct D). Because PSD-95 is rapidly degraded by protea- 
somes”, we examined the effect of the proteasome inhibitor lactacys- 
tin and found that it completely disrupted the unique distribution of 
the probe (Fig. le). Taken together, we concluded that AS-PaRacl is a 
probe that specifically labels the enlarged and newly generated spines 
(see Extended Data Fig. 3 for detailed cellular mechanisms), which are 
referred to as the ‘structurally potentiated spine’, and the potentiation 
labelled by AS-PaRacl is described as ‘potentiated spine’ hereafter. 


Spine labelling by AS-PaRacl in vivo 


To characterize this probe in vivo, we used the rotarod training as the 
model of motor learning. Because motor learning is impaired in Arc 
knockout mice*’, we assumed that the induction of AS-PaRacl by the 
Arc promoter” would enhance specific labelling during learning- 
induced potentiation. Arc::AS-PaRacl was delivered to the cortical 
layer II/III of the primary motor cortex (M1), where a robust reor- 
ganization of neuronal circuits is induced upon motor learning’. 
Cranial window surgery for two-photon imaging was performed 
based on the stereotaxic coordinates of the previous functional map- 
ping for the hind limb area*’. Spine volume and AS-PaRacl fluor- 
escence was compared quantitatively before and after training 
(Fig. 2a-e). Consistent with previous findings****, even in the train- 
ing-free period, a substantial number of spines ‘spontaneously’ under- 
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went structural potentiation (formation or enlargement of spines; see 
the definition in Extended Data Fig. 4a), but the trained mice exhib- 
ited significantly more structural potentiation compared with the 
non-trained mice (Fig. 2d). Notably, synaptic fluorescence of AS- 
PaRacl just after training (0 day) strongly correlated with the change 
in spine size upon training (Fig. 2f). It is unlikely that the accumula- 
tion of AS-PaRacl caused the potentiation or labelled the spines 
primed for potentiation such as for the ‘tagged synapse’**”*, because 
the initial quantity of AS-PaRacl before learning (—1 day) did not 
correlate with the change in spine size after learning (Fig. 2g). Analysis 
of AS-PaRacl puncta in the dendritic shaft suggested that the majority 
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Figure 2 | Spatiotemporal dynamics of AS-PaRacl labelling in vivo during 
the rotarod task. a, Schematic of the Arc promoter’’-driven AS-PaRacl. 

b, Experimental design. EP, electroporation; DsRed, Discosoma sp. red 
fluorescent protein. c, Images of spine formation (arrows) and spine 
enlargement (arrowheads). Green circles, AS-PaRacl; magenta circles, spines 
that initially acquired AP-PaRacl, but lost it afterward, but the structural 
change was persistent. d, Fraction of structural change of spines. 

e-g, Quantification of spine size and AS-PaRacl (e). Size measured by 
fluorescence intensity, arbitrary units (a.u.). Relationship between AS-PaRacl 
and volume change (AV) after (0 day, f) and before (—1 day, g) learning. 

h, Percentage of AS-PaRacl-containing spines (AS-PaRacl = 1 a.u., area 
shaded green in f). i, Mapping of AS-PaRacl. Potentiations before, just after, 1 
day after, and 2 day after learning are separately depicted as ‘Before’, ‘Learning’, 
‘After-1’, and ‘After-2, respectively. j, Retention of AS-PaRacl (green) or 
structural potentiation (magenta). k, Trajectory of spine size and AS-PaRacl 
intensities of the structurally potentiated spines. Scale bars, 2 |1m for ¢; 200 pm 
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of AS-PaRacl signal was located in the dendritic spines, and the 
labelling of shaft synapses was negligible (Extended Data Fig. 5). 
When we set the threshold of AS-PaRacl at 1 a.u. (Fig. 2f, green 
shaded area, 0 day), AS-PaRacl detected spine formation and enlarge- 
ment with sensitivities of 83 + 7.9% and 69 + 3.0% (mean + standard 
error of the mean), respectively (Fig. 2h), whereas labelling in other 
spine types was 2.3 + 0.12%. Since Arc::AS-PaRacl was induced only 
in the AS-PaRacl-positive neuron, the labelling properties in the AS- 
PaRacl-positive neuron were also calculated. The sensitivities for 
formation and enlargement were 94 + 2.7% and 95 + 4.8%, respect- 
ively, while false labelling in other spine types was 12.9 + 4.2% (306 
spines, 6 AS-PaRacl-positive neurons in 3 mice). Therefore, AS- 
PaRacl is a reliable marker of the potentiated spines in vivo. 

Next, we performed wide-view mapping of task-evoked potentia- 
tion using this probe (Fig. 2i, learning period), and we found that the 
task-evoked potentiation was elicited in 2.3 + 0.13% of spines and 
16.4 + 2.8% of neurons in the imaged area. We tracked an almost 
whole image of neurons (Extended Data Fig. 4b) and confirmed that 
when a spine was labelled by AS-PaRacl, its parental soma also 
expressed AS-PaRacl (6 AS-PaRacl-positive somata). Consistently, 
we could not find AS-PaRacl-positive spines in AS-PaRacl-negative 
soma (46 negative somata). Thus, the counting of AS-PaRacl puncta 
per AS-PaRacl-positive neurons could be approximated, which 
suggested that 14.7 + 2.01% of spines contained AS-PaRacl in the 
AS-PaRacl -positive neurons, implying that upon motor learning, a sub- 
stantial remodelling of spines (14.7%) was evoked in a small neuronal 
population (16.4%) in layer II/III (Extended Data Fig. 4d for detailed 
calculation). Similarly, a substantial remodelling was also observed in a 
small population of layer V neurons (Extended Data Fig. 4d). 

To characterize the synaptic retention of AS-PaRacl for photoac- 
tivation experiments in vivo, the individual spines that acquired 
AS-PaRacl were tracked, and were separately schematized from the 
day of AS-PaRacl appearance (Fig. 2c, i). We noticed that persistence 
of synaptic AS-PaRacl and the structural potentiation markedly var- 
ied among spines: some were preserved beyond 1 day after training 
(Fig. 2c, dendrite no. 1), while others disappeared (Fig. 2c, dendrites 
no. 2 and 3). Importantly, the structural potentiation and AS-PaRacl 
labelling triggered during the ‘learning’ period were more likely to be 
preserved than those triggered during the training-free period 
(Fig. 2j, ‘Before’ and ‘After-1 (potentiation 1 day after learning)’). 
Consistently, longitudinal imaging of the structurally potentiated 
spines revealed that the majority of those retaining AS-PaRacl for 
24h maintained structural potentiation for at least 48 h (Fig. 2k, green 
trace), whereas the structurally potentiated spines lacking AS-PaRacl 
retention returned to the pre-potentiated state (Fig. 2k, black trace). 
Such AS-PaRacl retention might be maintained by reverberation of 
learning-activated neuronal circuits, because AS-PaRacl was only 
expressed in Arc-expressing neurons, in which the persistent activa- 
tion helps to maintain plastic changes in the neocortex’**”. 


Selective spine shrinkage by AS-PaRacl 


Consistent with the previous findings that prolonged Racl activation 
induces spine shrinkage*""', we found that low-frequency photoacti- 
vation elicited spine shrinkage (Extended Data Fig. 6). Intriguingly, 
the spine shrinkage was significantly more robust when the 
AS-PaRacl construct was driven by the Arc promoter compared with 
the constitutive promoter CAG. Arc expression is increased by 
persistent neuronal activity”®, which induces the chronic activation 
of endogenous Racl, possibly contributing to the robust spine shrink- 
age by Arc::AS-PaRacl. Photoactivation-induced spine shrinkage was 
Racl-dependent, because deletion of Racl from AS-PaRacl while 
keeping other domains intact within AS-PaRacl (Arc::PSDA1.2- 
LOV-DTE) completely disrupted the shrinkage effect (Extended 
Data Fig. 6). To achieve spine shrinkage in a large cortical area in 
vivo, bilateral optical fibres were placed onto the cranial window 
(Fig. 3a and Extended Data Fig. 7). Low-frequency pulsed photoacti- 
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vation triggered shrinkage specifically in the AS-PaRacl-containing 
spines (Fig. 3b, c). The effect of photoactivation was comparable at 
least within 100 1m from the dura, suggesting that spines in layer I, at 
least, were affected by photoactivation (Fig. 3d). Photoactivation- 
induced spine shrinkage was accompanied by functional depotentia- 
tion, which was demonstrated by the excitatory postsynaptic calcium 
transients: the extent of spine shrinkage correlated with the decrease 
in amplitude, but not with the decrease in frequency (Fig. 3f-j). Spine 
shrinkage and the subsequent functional changes were spine-specific, 
but not branch- or cell-wide, because spine shrinkage was not trig- 
gered in neighbouring AS-PaRacl-negative spines, and the calcium 
transient was not affected either in the neighbouring spines or in the 
soma (Fig. 3f-j; Extended Data Figs 6 and 7b). 


Optical erasure of acquired skills 

To demonstrate the effect of spine shrinkage for learning, mice were 
bilaterally injected with the adeno-associated virus (AAV) 5 that 
encompassed layers I to V (Extended Data Fig. 7f). Mice were 
divided into two groups: animals in the first group were transfected 
with monomeric red fluorescent protein (mRFP) alone as a control, 
and the second group was transfected with AS-PaRacl and mRFP. 
Both groups exhibited significantly better motor performance after 
training, but only the performance of the AS-PaRacl group was 
disturbed by photoactivation (protocol 1, Fig. 4a, b), and the extent 
of learning disruption induced by photoactivation (photoactivation 
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Figure 3 | Selective shrinkage of AS-PaRacl-containing spines upon 
photoactivation (PA). a, Illustration of photoactivation. b, Images of the 
hind limb regions of cortices. SARE::AS-PaRacl and CAG::mRFP were 
transduced by in utero EP (a-d). c, Spine size following photoactivation. Dark 
green circles are eliminated spines. d, The effect of cortical depth on 
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effect) negatively correlated with the extent of training-evoked 
improvement (learning attainment) (Fig. 4f). In contrast, there 
was neither disruption of acquired learning nor a correlation 
between the effects of the training and photoactivation in the control 
group. Since photoactivation did not affect the running speed of the 
identical cohort used in Fig. 4b, it is unlikely that photoactivation 
disturbed the general motor performance (Extended Data Fig. 8). 

Photoactivation disrupted the acquired learning even 1 day after 
learning (protocol 2; Fig. 4c, g), when the majority of learning-evoked 
spines contained AS-PaRacl (Fig. 2k). In contrast, photoactivation 
treatment 2 days after learning (protocol 3), when both the number of 
AS-PaRacl-containing spines and the intensity of AS-PaRacl label- 
ling were markedly decreased (Fig. 2k; Extended Data Fig. 4), failed to 
disrupt acquired learning (Fig. 4d, h). Owing to daily spontaneous 
potentiation, a comparable number of spines contained AS-PaRacl in 
both protocol 2 and protocol 3 (Extended Data Fig. 4c). Nonetheless, 
only protocol 2 disrupted the acquired skill, suggesting that the learn- 
ing-evoked spine potentiation visualized by AS-PaRacl (at +1 day), 
but not spontaneous potentiation (at +2 day), accounted for the 
cortical memory traces. 

To demonstrate the task-specific role of synaptic ensembles, mice 
injected with AS-PaRacl-expressing AAV into the bilateral M1 were 
subjected to a dual task protocol. Mice sequentially learned two dis- 
tinct hind limb tasks: the rotarod and the beam tasks in the first and 
second sets of 2 days, respectively (Fig. 4i). We performed the photo- 
activation on day 4, because the majority of the rotarod-evoked AS- 
PaRacl puncta diminished by this time point (Fig. 2k). We confirmed 
that these two tasks evoked a comparable number of spine potentia- 
tion (Extended Data Fig. 7c). While learning performance in the beam 
task was not disrupted by the sham photoactivation treatment (fibre 
was inserted, but no illumination was performed), photoactivation 
disrupted the acquired performance in the beam task, without affect- 
ing the rotarod performance (Fig. 4j). We found no correlation 
between the effect of photoactivation in the rotarod and the beam 
task, which implies that synaptic ensembles recruited by each task did 
not overlap (Fig. 4k). 


Task-specific synaptic ensemble 


To visualize the synaptic ensembles formed during dual task learning, 
mice were sparsely labelled with AS-PaRacl, and were also subjected 
to the dual task protocol described before (Fig. 5a, dual task). 
AS-PaRacl puncta were classified on the basis of time of emergence 
(Fig. 5b), schematized for the rotarod task potentiation (day 2 spe- 
cific) as blue dots, for the beam task potentiation (day 4 specific) as 
yellow dots, and for the continuous potentiation for both periods 
(both day 2 and 4) as green dots. Interestingly, more than half of 
the beam-evoked potentiation were new ones (Fig. 5n), which were 
not potentiated previously in the rotarod task (yellow, Fig. 5c-e). 
Taken together with the behavioural data (Fig. 4i-k), we have demon- 
strated that the two learning tasks induced the potentiation of distinct 
synaptic ensembles. 

Finally, we examined whether the same spines are potentiated by 
the same task. Mice were divided into 2 groups (Fig. 5a). The first 
group was subjected to the rotarod task in the first 2 days, which was 
followed by the shrinkage of the learning-evoked potentiation by 
photoactivation, and then the identical rotarod task was re-trained 
(re-training condition). The second group was subjected to the 
rotarod task and subsequent photoactivation, and mice were not 
trained for another 2 days (home cage condition). We found that 
the majority of the optically shrunk spines returned to their prev- 
iously potentiated size after re-training, while the degree of re- 
potentiation was significantly lower in the home cage group, sug- 
gesting that re-training induced the re-potentiation of the same 
subset of spines (Extended Data Fig. 7d, e). Mice assigned to the 
dual task protocol were also compared, highlighting the difference 
in the potentiation patterns among the groups during the last 2 days 
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Figure 4 | Erasure of acquired learning by the photoactivation of spines 
labelled with AS-PaRacl. a, Experimental design (see Extended Data Fig. 9). 
b-d, Mice, which received AAV infection of SARE::AS-PaRacl, were allocated 
to protocols 1 (b), 2 (c), or 3 (d). An average of three trials of each mouse 
was used as the task performance (grey line). e, The critical period of 
photoactivation to erase acquired skills. f-h, Relationship between the effect 
of photoactivation and learning attainments. i, Experimental design. 

j, Performance trajectory of each skill. k, No correlation between 
photoactivation effect on acquired rotarod performance and that of beam 
task. Error bars represent s.e.m. 


(Fig. 5c-n). Contrary to the dual task group, spines potentiated 
during the first rotarod training were more likely to be re-poten- 
tiated after the second rotarod training in the re-training group 
(green, Fig. 5f-h, 1, n), while re-potentiation was significantly less 
prominent in mice that did not perform the re-training task (home 
cage group) (Fig. 5i-k, 1, n; Extended Data Fig. 7d, e). Furthermore, 
newly potentiated spines, which were not potentiated in the first 
2 days, were less abundant in the re-training and home cage groups 
compared with the dual task group (yellow, Fig. 5m, n). These 
findings suggest that reorganization of distinct synaptic ensembles 
is specific for each learning task. 
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Figure 5 | Visualization of synaptic ensembles for distinct learning tasks. 
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Discussion 


Current models of learning and memory suggest that structural plas- 
ticity of spines is the underlying mechanism of information storage in 
the brain. Nonetheless, clear visualization of spine structure in vivo 
requires the sparse labelling of neurons, and analysis of structural 
changes in spines is very laborious. In contrast, the 
AS-PaRacl signal appears as fluorescence puncta, which allows the 
detection of potentiated spines far more easily, even at high transfec- 
tion condition. Moreover, the role of potentiated spines can be 
directly assessed with photoactivation during behavioural examina- 
tions. In this study, we showed that photoactivation of the bilateral 
M1 cortex disrupted the acquired motor skill. We estimated the num- 
ber of learning-evoked neurons affected by photoactivation was 
approximately 4,700 neurons based on the following calculation: (a) 
X (b) X (c) X (d) X (e), in which (a) represents the density of neurons 
in the neocortex, 9.2 X 10*/mm; (ref 38); (b) the photoactivated area, 
fibre core diameter = 500 im, 0.4 mm7/bilateral; (c) the thickness of 
cortical layers (II-V) that were infected with AAV, 0.8 mm; (d) AAV 
infection efficiency, 80% (Extended Data Fig. 7f); (e) the percentage of 
AS-PaRacl-positive neurons upon learning, 20% (Extended Data Fig. 
4d). On the other hand, due to the limitations of light transmission, 
the majority of the shrunk spines resided in layer I (up to 100 um from 
the dura). The minimal number of learning-evoked spines illumi- 
nated by the optical fibre was roughly 410,000 spines in the bilateral 
M1 cortex based on the following calculation: (d) x (f) X (g) X (h), in 
which (f) represents the density of excitatory synapses in the mouse 
neocortex, 6.4 X 10°/mm? (ref 38); (g) learning-evoked potentiation, 
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approximately 2% of the spines in this area (Extended Data Fig. 4); 
(h) brain volume that received photoactivation: 0.4 mm” of photo- 
activation area X 0.1mm of depth = 0.04 mm7’). In the layer I, 
corticocortical feedback projections mediating top-down influences 
are concentrated, which strongly excite a subpopulation of pyr- 
amidal neurons”. Learning-evoked changes in neuronal ensembles 
via the synaptic reorganization of the M1 cortex directly predict 
future task performance*’. As nonlinear information integration 
primarily occurs in the tuft of dendrites in behaving animals*’, and 
activation of several spines in the tuft is sufficient to initiate NUDA 
spikes for action potential generation’. Thus, the shrinkage of 
potentiated spines in our study (410,000 spines in the dendritic tufts 
of 4,700 neurons) would be reasonably expected to disrupt the learn- 
ing-evoked substantial remodelling in a specific neuronal popu- 
lation. Formation of the dense connections in a small neuronal 
ensemble may be consistent with the formation of functional neur- 
onal clusters in the motor cortex after learning*’. Thus, synaptic 
optogenetics might be a powerful tool to uncover the mechanism 
of synaptic plasticity and its relationships with subsequent beha- 
vioural manifestations. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Ethical considerations. The use and care of animals in this study followed the 
guidelines of the Animal Experimental Committee of the Faculty of Medicine at 
the University of Tokyo. 

Plasmid construction and transfection. Mutagenesis and deletion of cDNA 
were conducted based on previously described methods’. Briefly, L514K and 
L531E mutations of the LOV2 domain were introduced with the following pri- 
mers (mutations underlined): 5’-ctttattggggttcagaaggatggaactgagcatg-3’, 
5'-gagagagegagtcatggagattaagaaaactgcag-3’, and with their corresponding com- 
plementary primers. PSD-95(APDZ1.2) was generated by deleting the nucleo- 
tides (nts) 250 to 993 based on the numbering of NM_019621. The DTE sequence 
of Arc mRNA was cloned from the 1st strand cDNA generated from the frontal 
cortex of postnatal day 50 (P50) Sprague-Dawley rats with the following primers 
(HindIII underlined): 5’-atgataagctttcggctccatgactcagccatgcc-3' and 5’-atga- 
taagcttagacacgagcagttaccaacacg-3'. The generated amplicon, which corre- 
sponded to 2036-2699 nts based on the numbering of NM_019361, was 
subcloned immediately downstream of the stop codon of PaRacl. 

Isothermal titration calorimetry (ITC). ITC for examining the affinity of 
PaRacl to the CRIB domain of PAK1 in the lit and dark states was carried out 
as described previously”. 

PaRacl pull-down assay. PaRacl variants were transfected into HEK293 cells by 
lipofection (Lipofectamine 2000; Invitrogen, Carlsbad, CA), and the cells were 
divided into lit and dark groups. The cells in the lit group were illuminated with a 
white fluorescent lamp (1.5 W for a 10-cm dish, 19 + 1.0mW cm ”) for 10 min 
before cell lysis, and the subsequent immunoprecipitation was performed in 
continuous light illumination until the final wash step of protein precipitants. 
Cells in the dark group were manipulated under a yellow fluorescence lamp, 
which excluded light at the wavelengths below 500 nm to avoid photoactivation. 
Cells were lysed in a lysis buffer (150 mM NaCl, 50 mM Tris*HCl pH 7.5, 1% 
Triton-X (v/v), 10 mM NaF, 10% glycerol (v/v), 1 mM EDTA, and protein inhib- 
itor cocktail (Complete; Roche Diagnostics)). Lysates were sonicated intermit- 
tently on the mixture of ice and water, and cell debris was cleared by 
centrifugation. The soluble fraction was incubated with an anti-GFP antibody 
(D253-3; MBL, Nagoya, Japan), followed by co-precipitation with Protein G 
Sepharose (GE Healthcare, Little Chalfont, UK). The precipitate was immuno- 
blotted with an anti-PAK1 antibody (no. 2602; Cell Signaling, Beverly, MA). 
Signal intensity of each band (net signal after subtracting the background signal, 
which was obtained from the region adjacent to the band) was measured using the 
Image] software (National Institutes of Health, Bethesda, MD). 
Immunofluorescence. Cell staining was performed as described previously’®. 
Briefly, dissociated rat cortical neurons at 21 days in vitro (DIV) were fixed with 
4% paraformaldehyde (PFA) for 30 min at room temperature. Mice were eutha- 
nized after the behavioural analyses, and their brains were perfusion-fixed with 
4% PFA and sectioned coronally to obtain 150-1m thick sections. Fixed samples 
were then permeabilized with Perm/Blocking buffer (2.5% normal goat serum (v/ 
v) in phosphate-buffered saline [PBS] with 0.3% Triton X-100 (v/v)) for 1h at 
room temperature. Samples were incubated for 24h at 4°C with the following 
primary antibodies: anti-phospho-neurofilament (SMI-31; Merck KGaA, 
Darmstadt, Germany), axonal marker; anti-PSD-95 (6G6; Abcam, Cambridge, 
UK); anti-Emx1 (sc-28220; Santa Cruz, CA) for the staining of pyramidal neu- 
rons. After rinsing with PBS (3 times, 5 min each), sections were stained with the 
corresponding secondary antibodies, followed by mounting. Cell labelling was 
examined with a confocal microscope (LSM510 META NLO; Carl Zeiss, 
Oberkochen, Germany). 

Hippocampal slice culture and transfection. Hippocampal slices (350-pjm 
thick) were dissected from Sprague-Dawley rats at P7 by a vibratome 
(VT1200S; Leica, Wetzlar, Germany), mounted onto 0.4-~m Millicell culture 
inserts (EMD Millipore, Billerica, MA). At DIV 11, slices were transfected bio- 
listically by a PDS1000/He Biolistic Gene Gun (Bio-Rad, Hercules, CA) with 1.6- 
uum gold microcarriers. At 2 to 4 days after transfection, cultures were transferred 
to the recording chambers and constantly perfused with oxygenated artificial 
cerebrospinal fluid (ACSF, 95% O, and 5% CO.) containing 125mM NaCl, 
2.5mM KCI, 2mM CaCl, 1mM MgCh, 1.25mM NaH,PO,, 26mM NaHCOs, 
20 mM glucose, and 200 11M Trolox (Sigma-Aldrich, St. Louis, MO) at 29-30 °C. 
In some experiments, we added tetrodotoxin (Wako, Osaka, Japan, 1 1M), bicu- 
culline methiodide (Sigma-Aldrich, 12 11M), lactacystin (EMD Millipore, 10 1M) 
to culture and the recording medium. 

In utero electroporation. This procedure was performed according to the pub- 
lished protocol with minor modifications''. Briefly, pregnant C57BL/6 mice were 
anaesthetized at embryonic day 13 (E13) or 14.5 (E14.5) with isoflurane, and 
AS-PaRacl-Venus and filler constructs (2 ug each) were injected unilaterally into 
the ventricle. Electrode pulses (electrodes: ~ (diameter) 3 mm for E13 and@ 5mm 
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for E14.5, 33 V, 50 ms pulse length, 950 ms pulse interval, 4 pulses) were charged 
unilaterally for the targeting to the M1 cortex. 

AAV viral production. AAV viral production was performed with the AAV 
helper-free system (Agilent Technologies, Santa Clara, CA). The pRep-Cap 
(AAV5; Applied Viromics, Fremont, CA) and the pHelper plasmid were co- 
transfected into the AAV-293 cells with polyethylenimine ‘Max’ (Polysciences, 
Warrington, PA). After 72-h-long incubation, cells were harvested and lysed 
with five freeze-thaw cycles. The resultant supernatants were overlaid on 40% 
sucrose solution containing 100mM Tris-HCl (pH 8.0), 150mM NaCl, and 
0.01% BSA (v/v), and were centrifuged at 100,000g for 16h at 4°C. The pellet 
(crude viral particles) was treated with 1,000 U benzonase nuclease (Novagen, 
Madison, WI) for 1h at 37°C. After filtering through a 5-\1m syringe filter to 
remove debris, the filtered material was subjected to CsCl gradient centrifu- 
gation (1.25gml~’ and 1.50g ml) at 257,300g for 48h at 15°C. The virus- 
rich fraction was restored, and the solvent was replaced with ASCF 
(1mM MgCl,, 10mM HEPES, CaCl)-free). Virus titre was determined with 
quantitative real-time PCR analysis (SYBR Green; Takara Bio Inc., Shiga, 
Japan). 

Virus injection and open-skull cranial window surgery. Adult male C57BL/6 
mice were anaesthetized with isoflurane, and mannitol (4 1g per g of body weight) 
and dexamethasone (7 1g per g of body weight) was administered intraperitone- 
ally to prevent brain swelling. Subcutaneous injections of ketoprofen (40 ug per g 
body weight) and penicillin/streptomycin (4 U per g body weight) were adminis- 
tered for 4 consecutive days beginning 1 day before the operation to prevent 
inflammation. The skull was exposed over the M1 cortex based on stereotactic 
coordinates. Then, 1 pl of AAV (0.5 to 8.0 x 107% genome copies ml!) was 
injected in the M1 cortex using a glass pipette (tip diameter 30 jm, bevelled at 
an angle of 45°) at a rate of 150nlmin ' using a syringe pump (Legato130; 
Muromachi Kikai, Tokyo, Japan). The location of the injection site was standar- 
dized among animals by using stereotaxic coordinates (AP = —0.8; ML = +1.0; 
DV = +0.5) from the skull. At the end of the injection, we waited 5 min before 
retracting the pipette. Stainless steel trephines (@ 2.7 mm; Fine Science Tools, 
Foster City, CA) were used to generate a circular open skull window. To avoid 
brain damage, intermittent drilling was performed at a speed of 10,000 r.p.m. with 
a continuous gentle perfusion of oxygenated ACSF, and we tried to avoid apply- 
ing excessive drilling pressure on the skull as much as possible. If we detected no 
bleeding, the drilled hole was covered with a circular coverslip (@ 2.7 mm, 
<0.1mm thickness, Matsunami Glass, Kishiwada, Japan) and sealed with dental 
cement (Fuji Lute BC; GC, Tokyo, Japan), which was followed by the attachment 
of the headgear for in vivo imaging. 

Two-photon imaging, glutamate uncaging, and photoactivation. Two-photon 
imaging was performed with an upright microscope (BX61WI; Olympus, Tokyo, 
Japan) equipped with an FV1000 laser scanning microscope system (FV1000, 
Olympus) and water-immersion objective lenses (LUMPlanFL N, 60 X, 1.0 N.A.; 
XLPLN25XWMP2, 25 X, 1.05 N.A.). Two mode-locked, femtosecond-pulse 
Ti:sapphire lasers (MaiTai DeepSee and HP; Spectra Physics, Mountain View, 
CA) were used at 1,000 nm for dual-colour imaging (Venus and mREP) and at 
720nm for glutamate uncaging. For three-colour imaging of mTurquoise/ 
GCaMP6/mRFP, the two independently captured images at 780nm 
(mTurquoise and mRFP) and 970 nm (GCaMP6 and mREP) were merged based 
on the identical fluorescence signal of mRFP. For in vitro imaging, 10-40 xy 
images (5 X digital zoom, 512 X 512 pixels) with a z-axis step size of 0.5 um 
were captured. For in vivo imaging, mice were anaesthetized with isoflurane, and 
images (2 X digital zoom, 1,024 X 1,024 pixels) were captured starting at the dura 
and progressing into the brain tissue for up to 650 1m in total with a step size of 
1.0m. For glutamate uncaging, 8mMMNI-glutamate (Tocris Bioscience, 
Bristol, UK) was dissolved in Mg**-free ACSF containing 1 1M tetrodotoxin, 
and using a glass pipette, this solution was applied locally onto the dendrites in the 
presence or absence of 10 1M forskolin (Wako) and 5 uM anisomycin (Sigma). 
Repetitive (5 Hz, 80 X) photolysis of MNI-glutamate in the spine heads was 
performed at 720 nm with a pulse duration of 0.6 ms, and intensity of the uncag- 
ing laser was 6 mW under the objective lens. 

Data quantification. xy images were stacked by the summation of fluorescence 
values at each pixel. For spine size estimation, individual spines on the dendrites 
were traced manually, and fluorescence intensity of the filler (mRFP, DsRed Ex2, 
or mTurquoise) was measured in the spine-head. For each channel, background 
intensity was subtracted from the fluorescence intensity (arbitrary units, a.u.) of 
each spine. During time-lapse imaging, daily variations in the recording condi- 
tions caused slight alterations in the fluorescence intensity, which was corrected 
with the fluorescence intensity changes of the filler along the parental dendritic 
shaft within a distance of 10 j1m from the spine. The ‘Spine enrichment index’ was 
estimated based on the previous report*’. To assess the uneven distribution of 
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PaRacl variants in the dendrite, the “Hot spot index’ was calculated using the 
following equations: 


1 n 
—x S- |(Spine enrichment index;— Spine enrichment index;+1)| 
il 


where ‘Spine enrichment index; and ‘Spine enrichment index;,.,’ represent the 
enrichment indices of a given spine and of its nearest neighbouring spine, respect- 
ively, and ‘n’ represents the number of spines in the measured dendritic branch 
(20 um long). Hot spot index was obtained from the most intensively labelled 
dendritic segments, and estimated by repetitive measurements of sequential near- 
est neighbouring spines. Quantification of fluorescence was performed with the 
Image] software. 

In vivo photoactivation in freely moving animals. Mice transduced by either 
AAV injection or in utero gene transfer were subjected to open-skull cranial 
window surgery, and the cranial holes were covered with bilateral glass windows. 
An outer cylinder (a non-bevelled 15-mm long 18G needle with an inner dia- 
meter of 0.9 mm) was implanted on the glass window for photoactivation. Before 
photoactivation, the optical fibre was inserted into the outer cylinder, and the tip 
of the fibre was placed directly onto the glass coverslip. The fibre and the outer 
cylinder were tightly locked together with Blu-Tack, which was easily removed 
after the photoactivation. Photostimulation was carried out using the COME-2 
series (Lucir, Osaka, Japan), which consist of 457-nm laser diodes, an optical 
swivel, and bilateral optical fibres (COME2-«DF1; core diameter of 500 um, 
0.5N.A.). The laser diode was adjusted to an output of 20 mW at the tip of each 
fibre. The light pulse was delivered for 150 ms at 1 Hz for 1 h, and the process was 
controlled by customised LabView programs (National Instruments, Austin, TX). 
Behavioural analysis. Mice were housed under standard laboratory conditions 
(12-h light/dark cycle with food and water available ad libitum) and were ran- 
domly allocated to experimental groups. All behavioural analyses were per- 
formed during the light phase. For motor learning (Extended Data Fig. 9), we 
used the rotarod training system (Rota-Rod Treadmills ENV-576; Med 
Associates, St. Albans, VT). Before the training sessions, mice were habituated 
to stay on the stationary rod for 2 min. During the training period, the fixed- 
speed protocol was applied at a slow speed (8 rpm), so mice rarely fell off the rod. 
After the mice were able to remain on the rod reliably, the speed was increased in 
a stepwise fashion to 40 rpm. We applied air puffs to the hind limbs as aversive 
stimuli to teach mice to face forward on the rod (Extended Data Fig. 9c), which 
helped them to hold on at higher speeds. After falling, the mice were immedi- 
ately placed back on the rod, and latency of falling was recorded automatically. 
Three training sessions were performed for 2 days (2h for 1 session, 6h of 
training in total). To assess learning, three trials of the rotarod test were carried 
out using an accelerating protocol (4 to 40 rpm) without air puffs with 5 min 
inter-trial intervals. 


For balance beam training, a hand-made beam apparatus was used (Extended 

Data Fig. 9d). Time to cross was scored using a stopwatch. The timer was started 
when the mouse was placed on the beam and ends when the first forepaw was 
placed in the goal cage. Air puffs to the hind limbs were also used to facilitate 
learning. Three training sessions were performed during 2 days (2 h for 1 session, 
6h of training in total). To evaluate the acquired performance, three trails of the 
beam test were carried out without air puffs with 5 min inter-trial intervals. Task 
performances were calculated as the averages of the three trials for both the 
rotarod and beam tasks. Mice with an improvement of < 20% compared to the 
pre-training performance were excluded from the analyses. The running speed of 
mice was measured by a video tracking system (Limelight3; Actimetrics, 
Wilmette, IL). The investigator was not blinded to the group allocation during 
the experiments because all behavioural outcomes were unambiguously deter- 
mined: for example, rotarod performance and locomotion were scored automat- 
ically with infrared or video tracking, and the manual scoring of the cross time for 
the beam test was unambiguous. 
Statistics. A series of experiments were performed as two, mostly three separate 
cohorts, and sample size was chosen based on the effect size shown in the first 
cohort in order to minimize the number of animals used in compliance with 
ethical guidelines. Data are shown as means ~ s.e.m. Detailed information on 
statistical methods/results are described in Extended Data Table 1. In brief, 
Mann-Whitney U tests were used to identify significant differences between 
two groups. Multiple comparisons were made by one-way analysis of variance 
(ANOVA, normal distribution and equal variances), nonparametric one-way 
ANOVA (Kruskal-Wallis test, for unequal variances), or one-way repeated mea- 
sures ANOVA followed by post-hoc Bonferroni test (to compare task perform- 
ance at different time points for within-subjects groups). Spearman rank 
correlation was used to test the strength of correlation between two variables. 
For all statistical tests *P < 0.05, **P < 0.01, ***P < 0.001 were considered sig- 
nificant. No statistical methods were used to predetermine sample size, the 
experiments were randomized and the investigators were not blinded to outcome 
assessment. 
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Extended Data Figure 1 | Optimization of the PaRac1 for the synaptic 
application. a, Isothermal titration calorimetry (ITC) experiments showing 
that the introduction of L514K and L531E mutations into the original PaRacl 
construct’? reduced binding with the CRIB domain of PAK1 in the dark. The 
light-insensitive form of LOV2(C450A) and the 1539E mutant, which mimics 
the unfolded ‘lit state’, were used as negative and positive controls, respectively. 
b, Leaky activity of PaRacl in the dark. In hippocampal neuronal cultures 
transfected with the original PaRacl, we observed a bearded appearance of 
the soma with numerous ectopic dendrites, while neurons transfected with 
PaRacl (L514K/L531E) were indistinguishable from normal neurons. 

c, Assessment of the affinity of PaRacl to the endogenous PAK1 using a pull- 
down assay. HEK293 cells, which were transfected with PaRacl-Venus, 

were divided into two groups: lit and dark. The cells in the lit group were 
radiated with light with a white fluorescent lamp before cell lysis, and 
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continuous light illumination was present during subsequent 
immunoprecipitation until the final wash step of protein precipitants. 
Conversely, cells in the dark group were lit with a yellow fluorescence lamp, 
which excludes light wavelengths below 500 nm. Co-immunoprecipitation 
with PAK1 revealed that PaRacl (L514K/L531E) barely bound with PAK] in 
the dark (the number of trials is depicted in the bar graph, **P < 0.01 using the 
Mann-Whitney U test). d, Targeting of PaRacl to the postsynaptic density. 
PSDA1.2-PaRacl (DTE (—) was transfected into dissociated cortical neurons 
at 21 days in vitro (DIV). Two days after transfection, cells were fixed with 
4% PFA, followed by permeabilization for the subsequent immunostaining 
procedure. Axons and endogenous PSD-95 were visualized using the anti- 
phospho-neurofilament and anti-PSD-95 antibodies, respectively, revealing 
that PSDA1.2—PaRacl co-localized with the endogenous PSD-95. Note 

that PSDA1.2-PaRacl did not co-localize with the axonal marker. 
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Extended Data Figure 2 | The distribution of AS-PaRacl is regulated by 
neuronal activity, and is dependent on the dendritic targeting element 
(DTE). a, Experimental design. b, Representative image of a cultured 
hippocampal neuron. c, Bicuculline (BIC) or tetrodotoxin (TTX) was added 
to the culture media at the designated time points. Images were captured at a 
high magnification and were tiled to visualize the entire cell. Green circles 
represent the AS-PaRacl puncta. d, Quantification of AS-PaRacl distribution 
(n = 6 each, *P < 0.05, **P < 0.01 using Kruskal-Wallis test followed by post- 
hoc Dennett’s test). e, Concomitant accumulation of AS-PaRacl and SEP- 
GluA1 in spines. Neurons were co-transfected with mTq (mTurquoise, filler), 
SEP-GluA1, and AS-PaRacl-mREP, and the constructs were expressed for 
36h. Potentiated spines during 36 h were shown by SEP-GluA1 fluorescence 
(arrowheads). Spearman rank correlation revealed a significant correlation 
between the spine enrichment indices of SEP-GluA1 and AS-PaRacl (each 
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circle represents one spine, 235 spines, 29 dendrites). f, Schematic of the 
constructs and representative images of single spine potentiations by 
glutamate uncaging in the presence of FSK (arrowheads). Rat hippocampal 
slice cultures were biolistically transfected with either AS-PaRacl or PSD- 
PaRacl (DTE (—)) followed by the uncaging experiments at DIV 13 (equivalent 
to postnatal day 20). g, Time course of the spine head volume (V) and 
accumulation of Venus upon uncaging. The mean changes in spine size and 
Venus accumulation in the stimulated or neighbouring spines are depicted 
60 min after uncaging. For quantification, we used pooled data from 
independent identically designed experiments. The data set for AS-PaRacl 
was identical with the FSK-treated group of Fig. 1c—e. Scale bars, 1 um. 

*P < 0.05 using the Mann-Whitney U test (n = 6 or 11 dendrites for 
PSD-PaRacl or AS-PaRacl, respectively). 


©2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


a PSD-PaRac1 
ATG Stop 


_ PSDA1.2 WetSseleteten) 


Pose NW 
§ 174 
0 = 
; 
From a 
Soma * KX eo 
——p-_ ( LX ; = " 
=? x fis 


e Somatic translation 


e Robust protein expression 
(Generation > Degradation) 


e Diffusion throughout dendrites 


e Integrated into PSD by constitutive turnover 


vila, 
a | 
ha 7 
r) we 
0? 
us ay 
_ -k pea 


e Integrated into PSD by 
structual potentiation 


Proportional expression to the spine size 


b AS-PaRac'1 
ATG Stop 
a PSDA1.2 Venus PaRact@DTE @ ~=Endogenous PSD95 
ats | §@ Venus (probe) 
we . Ribosome 
mRNA of the probe 
we 
ani Proteasome 
From XK Proteasome-dependent 
poll: & ot a probe degradation 
~ab 


(1) A little somatic translation 
— A little diffusion throughout dendrites 
(2) Dendritic targeting element (DTE) of MRNA 
At basal condition ( Degradation > Generation) 
— A little protein expression 
— Little integration into PSD 
(3) Activity-dependent structural potentiation and local translation 


Lactacystin 
ito, w/o Lac w/ Lac at 
Be 
8 


a 00 
VoL 
a rf 
oe O oS 8 1¢e 
Sex fem J 
~ad Py ears 
a a J -= 0 = 


(4) Effective capturing by the potentiated spine 
(5) The probe in PSD — resistant to degradation 
(6) The probe in shaft — sensitive to degradation 


Potentiation specific accumulation 


— Constitutive labelling due to 
the lack of degradation 


Proportional expression to the 
spine size and shaft distribution 


©2015 Macmillan Publishers Limited. All rights reserved 


Extended Data Figure 3 | Putative cellular mechanisms of the specific 
concentration of AS-PaRacl in potentiated spines. a, Uniform labelling of 
spines with the PSD-PaRacl construct that lacks DTE of Arc 3’ UTR (Fig. la, 
construct B). PSD-PaRacl is translated in the soma that is abundantly 
equipped with translational machineries. Therefore, the somatic protein 
expression of the probe is high (data not shown), which would outnumber the 
degradation, and the resulting proteins are transported throughout dendrites. 
The overflowing probes integrate into the postsynaptic density (PSD) during 
the constitutive turnover of PSD molecules. Therefore, probe expression is 
proportional to the spine size. b, Selective labelling of potentiated spines with 
AS-PaRacl (Fig. 1a, construct C). The following six mechanisms endow the 
potentiation-specific labelling with AS-PaRacl. (1) A little somatic translation: 
the moderate gene expression of AS-PaRacl, by which the translation of AS- 
PaRacl protein is limited in the soma (see Extended Data Fig. 2b), and 
therefore, the non-specific overflow of this probe from the soma into the 
dendrites is minimal. (2) Dendritic targeting element (DTE): the essential 
domains of AS-PaRacl are the N-terminal PSD-95 (PSDA1.2) and the 3’ UTR 
of Arc mRNA (DTE). DTE has a pivotal role in the dendritic targeting of 
mRNAs***°. One of the most well-known DTE is present in the Arc mRNA", 
which is targeted to stimulated dendritic segments in an activity-dependent 
manner’®. The transport of mRNA out of soma also contributes to the limited 
translation of the probe in the soma described in (1). In the absence of 
activation, the limited amount of translational machineries and presence of 
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degradation components in the dendrites maintains the locally translated 
probe at a low level, which results in a low rate of AS-PaRac]1 integration 
into the PSD during the constitutive turnover of PSD proteins. (3) Local 
protein synthesis: persistent structural plasticity of the spine depends on the 
activity-dependent dendritic synthesis of proteins”, and the translation of 
Arc mRNA is controlled by activity levels'’. (4) Effective capturing of PSD 
proteins in the structurally potentiated spines: the potentiated spine, which 
rapidly requires new copies of PSD proteins, captures diffusing PSD proteins 
more efficiently’””*. (5) Increased stability of AS-PaRacl in the PSD: it is 
likely that the stability of the PSD-integrated AS-PaRac] increase, as does the 
typical PSD scaffold proteins’’. The ubiquitination might be underling 
mechanism of the increased stability, because the ubiquitination site of 
AS-PaRacl resides in the N-terminal domain of PSD-95, the domain of which 
is aggregated to form head-to-head multimerization in the postsynaptic 
scaffold”. Thus, once AS-PaRacl is integrated into the PSD, the ubiquitination 
site may be concealed, and AS-PaRacl becomes relatively stable. (6) Sensitivity 
of unbound AS-PaRacl against the proteasomal degradation: contrary to 

the PSD-integrated AS-PaRacl, unbound AS-PaRacl is sensitive to 
degradation because the ubiquitination site is not concealed. This scenario is 
supported by the administration of lactacystin (right panel), which inhibits 
proteasomes and thus completely disrupts the uneven distribution of 
AS-PaRacl. Similar mechanisms are relevant for newly formed spines, because 
spine formation is associated with spine enlargement”. 
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Extended Data Figure 4 | Raw data of quantification and synaptic mapping. 
Data from Fig. 2. a, Quantification of spine size (based on DsRed fluorescence) 
and AS-PaRacl fluorescence after learning are depicted separately based on the 
classification of spines. The definitions of “New spine’, ‘Enlarged spine’ and 
others are described on the right. Each arrow indicates the trajectory of a spine; 
beginning and end points represent the absolute values before and after the 
rotarod task, respectively. b, xy images were captured from the dura to a 
depth of 300 tm with a step-size of 1.0 jum, and were stacked by the summation 
of fluorescence values at each pixel. z-stacked images of 10 overlapping fields 
were aligned to generate the combined images. AS-PaRacl and AS-PaRacl/ 
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DsRed merged images are shown. AS-PaRac] that was present before learning 
(—1 day, yellow), appeared shortly after learning (learning period, 0 day, 
green), 1 day (after-1, +1 day, blue), or 2 days after learning (after-2, +2 day, 
purple) are depicted to show the spatiotemporal distribution of AS-PaRacl 
triggered in each period. c, Time course of the number and fraction of 
AS-PaRacl-positive spines in each period. d, Calculation of the learning- 
evoked spine/neuron ratio (%). Example of the calculation is based on the 
raw data shown in b and c. The table indicates the comparison between neurons 
in layer II/III (in utero electroporation at E14.5) and layer V (in utero 
electroporation at E13). 
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Extended Data Figure 5 | Assessment of AS-PaRacl punctaon the dendritic _ the calculation of fluorescence in each punctum is shown. c, Quantification 
shaft. a, The two possible synapse types that AS-PaRacl puncta may represent _ of the fluorescence of the filler and AS-PaRacl upon the emergence of 

on the dendritic shaft. xy images were captured to encompass the entire AS-PaRacl puncta. Each arrow indicates the trajectory of each ROI; beginning 
z-range of the dendrite of interest with a step-size of 0.5 1m, and images were _ and end points represent the absolute values before and after the emergence 
stacked by the summation of fluorescence values at each pixel. The fluorescence _ of AS-PaRacl, respectively. The ROI at (i) exhibited a concomitant 
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emerged on the dendritic spine that undergoes structural potentiation. In in a typical dendritic spine (ii). All examined AS-PaRacl puncta on the 
contrast, fluorescence of the filler would not increase, if AS-PaRacl wasin the dendritic shaft exhibited positive correlations, suggesting that the majority 
shaft synapse. b, Example of the dendrites before and after the emergence of | of AS-PaRacl puncta emerge on the dendritic spine during the structural 
AS-PaRacl. AS-PaRacl puncta on the shaft and on the dendritic spine are changes of the spine. 
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Extended Data Figure 6 | Racl-dependent shrinkage of dendritic spines 
induced by low-frequency photoactivation. a, The protocol of 
photoactivation. Photoactivation was performed in the region that 
encompasses the branch of interest. b, Neurons in the hippocampal slice culture 
(DIV 11) were biolistically transfected with DNA constructs shown in the 
schematic image on the left. Representative dendritic images upon 
photoactivation are shown on the right. Robust shrinkage (arrowheads) was 
observed in the spines transfected with AS-PaRacl driven by the SARE-Arc 
promoter. Despite their adjacent location to the AS-PaRac1-positive spines, 
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AS-PaRacl-negative spines were not affected by the photoactivation. c, Time 
course of the spine head volume (V) of Venus-positive (upper panel) and 
negative spines (lower panel). White, red, and blue circles represent CAG::AS- 
PaRacl, SARE::AS-PaRacl, and SARE::PSDA1.2-LOV-DTE, respectively 

(n = 12 cells each). d, The mean relative change in spine head size in Venus- 
positive and negative spines 60 min after photoactivation. Scale bars, 2 um. 
*P < 0.05 and ***P < 0.001 according to the Kruskal-Wallis test followed 
by the post-hoc Scheffé’s test. 


©2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


4 b PA 
__BeforePA Fiver ba worse areata) 
oo Reson 
= 1.0mm, a iT 
Center eases AP -1.1, ML +1.0 aii 


Optical fiber > 100 
f core @ = 500 um 3 
ere N.A. = 0.5 & 75 
> 
| Outer cylinder @ 50 
Cement] ~ 
Headgear @ 2 
£& 
fom 
oO 0 


Glass window, thickness < 0.1 mm 


Before PA 
After PA 


C ‘ d Re-training — Home cage eC. va ASC )sBine |, AS () spine 
500 xx 01S = 300 300 — AS (+) spine Ds * 140 HM Re-training 
n o — AS(-)spine 120 ™ ~~ 120 Home cage 
< 400 © © aa 
£s © 200 200 2 100 
S 300 > 2 > a 80 80 
& 200 =a So 60 60 
< ~ = 100 100 oe 40 
a o Y 
@ 100 > 2 35 
Pra = 3 20 
< : N o 0 e D ? D 
> of 3 “fos ££ 2 x £ g ££ £ 
6 TD Of o5 5 o5 5 eo § & o 5 = 
Oo of og ges se gg & ge & 
5 g<s ae a < s$ g* $ 
Ss o . . 
<x ® o ig) o 
~ x < x x 
f Layerll/IlI LayerV 


mRFP/Emx1/TO-PRO3 fi 


2 ® 

== 50 

a8 

i = 6 + ‘ : a ae 

E & a= oy mRFP/Emx1/TO-PRO3 mRFP/Emx1/TO-PRO3 

£8 ge Af 

eS SS 
Extended Data Figure 7 | Spine shrinkage in broad areas of the bilateral re-training and home cage protocols shown in Fig. 5. The majority of 
motor cortices induced by blue laser illumination. a, Schematic of the AS-PaRacl-positive spines displayed photoactivation-induced shrinkage and 
bilateral cranial windows, optical fibres, and the photoactivation protocol. subsequent recovery. *P < 0.05 according to the Mann-Whitney U test. 
b, Representative images of spine shrinkage in the M1 cortex upon f, The success of AAV5 vector injection into the bilateral M1 cortex was 


photoactivation in vivo. AS-PaRacl-positive spines (green arrowheads) shrank, confirmed by the presence of mRFP fluorescence after behavioural tests. High 
while the AS-PaRac1-negative ones (white arrowheads) did not. Quantification _ efficacy of virus infection in layer II/III and V pyramidal neurons was 


of spine size is shown on the right. c, The mean number of AS-PaRacl demonstrated with Emx1 immunostaining, which labels pyramidal neurons. 
puncta per fields was calculated in mice shown in the Fig. 4i. d, e, Spine The mice without bilateral mRFP signal in the M1 cortex were excluded from 
structure and AS-PaRacl were imaged in mice, which were subjected to the the data analysis. 
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Extended Data Figure 8 | No effect of photoactivation on the locomotor 
activity of mice. a, Experimental schedule. The running speed of AS-PaRacl- 
injected mice in protocol no. 1 (Fig. 4a) was measured with a video-tracking 
system. To minimize the effect of circadian rhythm on locomotion, mice were 


tested at the same time of the day before and after photoactivation. 
b, Representative traces of locomotion and temporal sequences of running 


speed are depicted. c, Statistical analysis shows that photoactivation has only 
a negligible effect on running speed. 
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Extended Data Table 1 | Detailed information on sample descriptions and statistics 
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Sample Statics 
Description Size (n) Methods Comparison P values Correlation coefficient 
Construct (A) = 13 dendrites/13 slices/3 rats Enrichment} Hot spot 
Construct (B) = 20 dendrites/20 slices/6 rats A ‘ (A) vs (B) 0.03896 | 0.48912 
‘ = = : 7 One-way factorial ~ 
Figure la Construct (C) = 23 dendrites/23 slices/6 rats ANOVA (A) vs (C) 0.00001 0.00000. 
Construct (D) = 8 dendrites/8 slices/3 rats reherer er (A) vs (D) 0.69713 | 0.81024 
Hippocampal {Construct (E) = 8 dendrites/8 slices/3 rats (p (A) vs (E) 0.00130 | 0.00929 
slice culture mRFP Venus 
Fi Ib-d + uncaging alone (A) = 15 dendrites/15 slices/6 rats Kruskal Wallis test (A) vs (B) 0.62258 | 0.01139 
‘eure "| Gene Gun _funcaging + FSK (B) = 35 dendrites/35 slices/8 rats (post-hoc Scheffe's (A) vs (C) 0.00430] 0.10843 
uncaging + Aniso (C) = 20 dendrites/15 slices/6 rats test) (B) vs (C) 0.00000 | 0.00000 
Enrichment} Hot spot 
j in ee = 13 jles/13 slic ann Whi 
Figure le Vehicle = 1 dendrites/t pslices/2 rats Mann: ‘Whitney test Veh vs Lac 0.00134! 0.00030 
Lactacystin = 8 dendrites/8 slices/3 rats (two-sided) 
Enlarged New Shrunk {Eliminated 
Fioure 2d ining = 2793 spines/7 mic ann Whi any 
igure (Training 79: Spines/7 mice Mann. ‘Whitney test (Training) vs 0.03887} 0.02014] 0.12134] 0.12134 
2 (No training = 718 spines/3 mice (two-sided) (No training) 
In vivo MI AV (Oday) & 
Figure 2f cortex ay) 0.00000 0.61278 
sae A ‘ Spearman's rank AS (0 day) 
a Training = 2090 spines/3 mice eotietation Goeth AV (0 day) & 
Figure 2g | In utero EP O day, 0.51746 -0.03549 
‘ AS (-1 day) 
(E14.5) 
Enlarged New 
Fi 2k 68 spines larged ines) out of 2090 total [Mann-Whitney test AS (+1 day)21 
igure 6 spines (en TEC or new spines) out o1 ota ann hitney test [AS (+1 day)21] 0.00263! 0.00016 
spines for Fig 2e—j (two-sided) vs [AS (+1 day)<1] 
2 In vivo M1 Mann-Whitney test [AS (4)] vs 
Figure 3c ss 0.00000 
cortex : ; (two-sided) [AS (-)] 
: 94 spines/6 mice 5 se Tank 
' 4 pearman's rai ie 
3 .2815 -0. 
Figure 3d In utero EP correlation coefficient AVE Depth O28} ols 
AS (4) AS(-) AS (+)_|_AS(-) 
i 3i,j i a é s " % . iy i 5 23: 
Figure 33, j Hippocampal 24 spines/12 slices/6 mice Spearman's Tank : AV & AAmp 0.01793 0.57016 0.58235 | 0.23810 
slice culture correlation coefficient AV & AFreq 0.58679 | 0.95537 -0.14706 | 0.02381 
+ Amplitude _| Frequency 
i 3i,j Gene Gun ile i ‘a S 
Figure 3i, j i 6 notialiOah Wilcoxon signed rank (Before PA) vs 0.24886 | 0.09620 
test (After PA) 
mRFP AS AS AS 
(Prot #1) | (Prot #1) | (Prot #2) | (Prot #3) 
(Pre-training) vs | 4 49990} 0.00000} 0.00000 | 0.00034 
(0 day) 
(0 day) vs NA NA] 0.83880 NA 
(+1 day) 
aes ea pee . NA NA Na|_ 1.00000 
Figure 4b-d ne ANOVA 7 
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Macromolecular complexes are essential to conserved biological processes, but their prevalence across animals is 
unclear. By combining extensive biochemical fractionation with quantitative mass spectrometry, here we directly 
examined the composition of soluble multiprotein complexes among diverse metazoan models. Using an integrative 
approach, we generated a draft conservation map consisting of more than one million putative high-confidence 
co-complex interactions for species with fully sequenced genomes that encompasses functional modules present 
broadly across all extant animals. Clustering reveals a spectrum of conservation, ranging from ancient eukaryotic 
assemblies that have probably served cellular housekeeping roles for at least one billion years, ancestral complexes 
that have accrued contemporary components, and rarer metazoan innovations linked to multicellularity. We 
validated these projections by independent co-fractionation experiments in evolutionarily distant species, affinity 
purification and functional analyses. The comprehensiveness, centrality and modularity of these reconstructed 
interactomes reflect their fundamental mechanistic importance and adaptive value to animal cell systems. 


Introduction 


Elucidating the components, conservation and functions of multi- 
protein complexes is essential to understand cellular processes’”, 
but mapping physical association networks on a proteome-wide scale 
is challenging. The development of high-throughput methods for 
systematically determining protein-protein interactions (PPIs) has 
led to global molecular interaction maps for model organisms 
including E. coli, yeast, worm, fly and human*"°. In turn, comparative 
analyses have shown that PPI networks tend to be conserved’’”’, 
evolve more slowly than regulatory networks’’, and closely mirror 
function retention across orthologous groups'''*"*. Yet fundamental 
questions arise’®'’. Here we define: (i) the extent to which physical 
interactions are preserved between phyla; (ii) the identity of protein 
complexes that are evolutionarily stable across animals; and (iii) the 
unique attributes of macromolecule composition, phylogenetic 
distribution and phenotypic significance. 


Generating a high-quality conserved interaction dataset 


As previous cross-species interactome comparisons, based on experi- 
mental data from different sources and methods, show limited over- 
lap’*"*, we sought to produce a more comprehensive and accurate 
map of protein complexes common to metazoa by applying a stan- 
dardized approach to multiple species. We employed biochemical 
fractionation of native macromolecular assemblies followed by tan- 
dem mass spectrometry to elucidate protein complex membership 
(Fig. 1; see Supplementary Methods). Previous application of this 
co-fractionation strategy to human cell lines preferentially identi- 
fied vertebrate-specific protein complexes®, so we selected eight 
additional species for study on the basis of their relevance as model 


organisms, spanning roughly a billion years of evolutionary diver- 
gence (Fig. 1a). The resulting co-fractionation data (Fig. 1b) acquired 
for Caenorhabditis elegans (worm), Drosophila melanogaster (fly), 
Mus musculus (mouse), Strongylocentrotus purpuratus (sea urchin), 
and human were used to discover conserved interactions (Fig. 1c), 
while the data obtained for Xenopus laevis (frog), Nematostella 
vectensis (sea anemone), Dictyostelium discoideum (amoeba) and 
Saccharomyces cerevisiae (yeast) were used for independent valid- 
ation. Details on the cell types, developmental stages and fractionation 
procedures used are provided in Supplementary Table 1. 

We identified and quantified (see Supplementary Methods) 13,386 
protein orthologues across 6,387 fractions obtained from 69 different 
experiments (Fig. 2a), an order of magnitude expansion in data 
coverage relative to our original (H. sapiens only) study®. Individual 
pair-wise protein associations were scored based on the fractionation 
profile similarity measured in each species. Next, we used an integ- 
rative computational scoring procedure (Fig. 1c; see Supplementary 
Methods) to derive conserved interactions for human proteins and 
their orthologues in worm, fly, mouse and sea urchin, defined as 
high pair-wise protein co-fractionation in at least two of the five 
input species. The support vector machine learning classifier used 
was trained (using fivefold cross-validation) on correlation scores 
obtained for conserved reference annotated protein complexes 
(see Supplementary Methods), and combined all of the input species 
co-fractionation data together with previously published human*” 
and fly interactions’ and additional supporting functional association 
evidence (HumanNet). Measurements of overall performance 
showed high precision with reasonable recall by the co-fractionation 
data alone (Fig. 2b), with external data sets serving only to increase 
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Figure 1 | Workflow. a, Phylogenetic relationships of organisms analysed in 
this study. We fractionated soluble protein complexes from worm (C. elegans) 
larvae, fly (D. melanogaster) S2 cells, mouse (M. musculus) embryonic stem 
cells, sea urchin (S. purpuratus) eggs and human (HEK293/HeLa) cell lines. 
Holdout species (“T’, for test) likewise analysed were frog (X. laevis), an 
amphibian; sea anemone (N. vectensis), a cnidarian with primitive eumetazoan 
tissue organization; slime mould (D. discoideum), an amoeba; and yeast 

(S. cerevisiae), a unicellular eukaryote. b, Protein fractions were digested and 


precision and recall as we required all derived interactions to 
have extensive biochemical support (see Supplementary Methods). 
Co-fractionation data of each input species affected overall perform- 
ance, in each case increasing precision and recall (Extended Data 
Fig. la). The final filtered interaction network consists of 16,655 
high-confidence co-complex interactions in human (Supplementary 
Table 2). All of the interactions were supported by direct biochemical 
evidence in at least two input species, with half (8,121) detected in 
three or more (Extended Data Fig. 1b), enabling cross-species mod- 
elling and functional inference. 


Benchmarking protein complexes 


Multiple lines of evidence support the quality of the network: ref- 
erence complexes withheld during training were reconstructed with 
higher precision and recall (Fig. 2b; see Extended Data Fig. 1c) relative 
to our human-only map’. The interacting proteins were also sixfold 
enriched (hypergeometric P<1X10~~*) for shared subcellular 
localization annotations in the Human Protein Atlas Database”', 
21-fold enriched (P< 1 10~°°) for shared disease associations in 
OMIM”, and showed highly correlated human tissue proteome 
abundance profiles” (Extended Data Fig. 2a). 

To independently verify the reliability of these projections, we 
examined the co-fractionation profiles of putatively interacting 
orthologues (interologues) in the four holdout species, as obtained 
by protein quantification across 1,127 biochemical fractions (see 
Supplementary Methods). Whereas sequence divergence changed 
absolute chromatographic retention times (Extended Data Fig. 2b), 
most of the predicted interactors showed highly correlated 
co-fractionation profiles among the holdout test species to a degree 
comparable to those of the input species used for learning (Fig. 2c). 
The biochemical data obtained for frog and sea anemone showed 
slightly better agreement than that for Dictyostelium and yeast that 
was proportional to evolutionary distance”. 

Besides indicating stably associated proteins, our multispecies 
biochemical profiles faithfully recapitulated the architecture of 
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analysed by high-performance liquid chromatography tandem mass 
spectrometry (LC-MS/MS), measuring peptide spectral counts and precursor 
ion intensities. c, Integrative computational analysis. After orthologue mapping 
to human, correlation scores of co-eluting protein pairs detected in each 
‘input’ species were subjected to machine learning together with additional 
external association evidence, using the CORUM complex database as a 
reference standard for training. High-confidence interactions were clustered 
to define co-complex membership. 


multiprotein complexes of known three-dimensional structure, with 
a general trend for most correlated protein pairs to be spatially closer 
(Extended Data Fig. 2c). For example, hierarchical clustering of 
30S proteasome subunits according to chromatographic elution 
profiles of all five input species correctly separated the 20S and 
19S particles and the regulatory lid from the base sub-complex 
(Fig. 2d), reflecting known hierarchies of complex formation and 
disassembly. 


Landscape of interaction conservation across species 


Because most of the interacting components were phylogenetically 
conserved across vast evolutionary timescales, we were able to predict 
over one million high-confidence co-complex interactions among 
orthologous protein pairs for 122 extant eukaryotes with sequenced 
genomes (Supplementary Table 3). The number of interactions 
ranged from ~8,000 to ~15,000 per species depending on phyla 
(Fig. 2e), with more projected among Deuterostomes, Protostomes 
and Cnidaria, which show high component retention, and fewer in 
Fungi, Plants and, especially, Protists, where the relative paucity of 
co-complex conservation probably reflects inherent clade diversity, 
especially in parasite genomes (for example, gene loss among 
Apicomplexa). While largely congruent with previous smaller-scale 
studies of PPI conservation”, the majority of conserved 
co-complex interactions are novel (less than one-third curated in 
CORUM, STRING and GeneMANIA databases; Fig. 2e). This mark- 
edly increases the number of metazoan protein interactions reported 
to date (Supplementary Table 3), covering roughly 10%-25% of the 
estimated conserved animal cell interactome**”’, opening up many 
new avenues of inquiry. 

To systematically define evolutionarily conserved functional mod- 
ules, we partitioned the interaction network using a two-stage cluster- 
ing procedure (Fig. 1c; see Supplementary Methods) that allowed 
proteins to participate in multiple complexes (that is, moonlighting) 
as merited (Extended Data Fig. 3a). The 981 putative multiprotein 
groupings (Fig. 3a; see Supplementary Table 4) include both 
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Figure 2 | Derivation and projection of protein co-complex associations 
across taxa. a, Expanded coverage via experimental scale-up relative to our 
previous human study®. Chart shows number of proteins detected, most (63%) 
in two or more species. b, Performance benchmarks, measuring precision and 
recall of our method and data in identifying known co-complex interactions 
(annotated human complexes from CORUM”). Complexes were split into 
training and withheld test sets; fivefold cross-validation against 4,528 
interactions derived from the withheld test set shows strong performance gains, 
beyond baselines achieved using only co-fractionation or external evidence 
alone. TP, true positive; FP, false positive; FN, false negative. c, Plots showing 
high enrichment (probability ratio of interacting) of predicted interacting 
orthologous protein pairs (relative to non-interacting pairs) among highly 


many well-known and novel complexes linked to diverse biological 
processes (Extended Data Fig. 3b). The complexes have estimated 
component ages spanning from ~500 million (metazoan-specific, 
or ‘new’) to over one billion years (ancient, or ‘old’) of evolutionary 
divergence. Details of species, orthologues, taxonomic groups, protein 
ages and evolutionary distances are provided in Supplementary 
Tables 3 and 5 and Supplementary Methods. 

Although proteins arising in metazoa (by gene duplication or other 
means) account for about three quarters of all human gene products, 


correlated fractionation profiles, in both the holdout validation (test, T) and 
input species (colours reflect clade memberships). d, Left, representative 
co-fractionation data (normalized spectral counts shown for portions of 3 of 42 
experimental profiles) from human, fly and sea urchin showing characteristic 
profiles of proteasome core, base and lid sub-complexes. Hierarchical 
clustering (right) of pan-species pairwise Pearson correlation scores (centre) 
is consistent with accepted structural models (Protein Data Bank ID: 4CR2; 
core, red; base, blue; lid, green; out-clusters, white). e, Projection of conserved 
co-complex interactions across 122 eukaryotic species, indicating overlap with 
leading public PPI reference databases*”*'. STRING bars indicate excess 

over CORUM; GeneMANIA bars indicate excess over both; component and 
interaction occurrences across clades indicated at bottom. 


they form only about a third (39%; 147) of the clusters (Fig. 3a). These 
‘new’ complexes tend to be smaller (=3 components; Fig. 3b) and 
specific (components not present in ‘mixed’ complexes). This indi- 
cates that although protein number and diversity greatly increased 
with the rise of animals”, most stable protein complexes were inher- 
ited from the unicellular ancestor and subsequently modified slightly 
over time (Fig. 3c and Supplementary Table 5). Indeed, the dominant 
phylogenetic profile of complexes across Eukarya (Fig. 3d) is com- 
posed either entirely (344 old complexes) or predominantly (490 
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Figure 3 | Prevalence of conservation of protein complexes across Metazoa 
and beyond. a, Conserved multiprotein complexes, identified by clustering, 
arranged according to average estimated component age (see Supplementary 
Methods and ref. 25). Proteins (nodes) classified as metazoan (green) or ancient 
(orange); assemblies showing divergent phylogenetic trajectories termed 
‘mixed’. b, Example complexes with different proportions of old and new 
subunits. c, Presumed origins of metazoan (new), mixed and old complexes; ‘? 
indicates variable origins of new genes. d, Heat map showing prevalence of 
selected complexes across phyla. Colour reflects fraction of components with 
detectable orthologues (absence, dark blue). Sea anemone (N. vectensis) is the 
most distant metazoan (cnidarian) analysed biochemically. 


mixed complexes) of ancient subunits ubiquitous among eukaryotes 
(Extended Data Fig. 4a; see Supplementary Table 5 for details), the 
latter presumably reflecting preferential accretion of additional com- 
ponents to pre-existing macromolecules (Fig. 3c)”*. 

These primordial complexes are present throughout the 
Opisthokonta supergroup (animals and fungi), estimated to be more 
than one billion years old’’, and plants (and presumably lost/signifi- 
cantly diverged among parasitic protists). Reflecting this central 
importance, these complexes tend strongly to be ubiquitously 
expressed throughout all cell types and tissues (Extended Data Fig. 
5a), are abundant (Extended Data Fig. 5b), and are enriched for 
associations to human disease and perturbation phenotypes in C. 
elegans (Supplementary Table 6). In comparison with other proteins 
in the 16,655 interactions, the older, conserved proteins present in 
these stable complexes have lower average domain complexity 
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(P< 0.02; see Supplementary Methods), suggesting multi-domain 
architectures underlie more transient or tissue-specific interactions. 
Whereas mixed and old complexes are enriched for functional asso- 
ciations with core cellular processes, such as metabolism (Extended 
Data Fig. 4c), the strictly metazoan complexes were far more likely to 
be linked to cell adhesion, organization and differentiation, consist- 
ent with roles in multicellularity. Reflecting these different evolu- 
tionary trajectories, new clusters are substantially more enriched for 
cancer-related proteins (42%; 62/147; hypergeometric P = 1 X 10 °) 
compared to strictly old (15%; 53/344; P=1*X 10 3) clusters 
(Z-test < 0.0001) (Supplementary Table 7), have generally lower 
annotation rates (Extended Data Fig. 4b), and show different pre- 
ponderances of protein domains (Extended Data Fig. 4c and 
Supplementary Table 6). 


Independent biological assessment 


We used multiple approaches to assess the accuracy (Fig. 4) and 
functional significance (Fig. 5) of the predicted complexes. First, we 
performed affinity purification mass spectrometry (AP/MS) experi- 
ments on select novel complexes from the new, old and mixed age 
clusters, validating most associations in both worm and human 
(Fig. 4a and Extended Data Fig. 6a). We next performed a global 
validation by comparing our derived complexes to a newly reported 
large-scale AP/MS study of 23,756 putative human protein interac- 
tions detected in cell culture (E. L. Huttlin et al., BioGRID preprint 
166968), and observed a partial, but highly statistically significant, 
overlap to a degree comparable to literature-derived complexes 
(Fig. 4b, Extended Data Fig. 6b). 

Wealso observed broad agreement between the derived complexes’ 
inferred molecular weights (assuming 1:1 stoichiometries) and migra- 
tion by size-exclusion chromatography (Fig. 4c and Extended Data 
Fig. 7a) and density gradient centrifugation (Extended Data Fig. 7b). 
A prime example is the coherent profiles of a large (~500 kDa) 
mixed complex with several un-annotated components (Fig. 4d and 
Extended Data Fig. 8), dubbed ‘Commander’, because most 
subunits share COMM (copper metabolism MURR1) domains” impli- 
cated in copper toxicosis*’, among other roles*”*’. Commander con- 
tains coiled-coil domain proteins CCDC22 and CCDC93 (Figs 4a, d) 
in addition to ten COMM domain proteins, broadly supported 
by co-fractionation in human, fly and sea urchin (Extended Data 
Fig. 9a-c and supporting website, http://metazoa.med.utoronto.ca/ 
php/view_elution_image.php?id=71&cond=ms2). 

We found an unexpected role in embryonic development for 
Commander, whose subunits are strongly co-expressed in devel- 
oping frog (Extended Data Fig. 9d, e). COMMD2/3-knockdown 
(morpholino) tadpoles showed impaired head and eye development 
(Fig. 5a and Extended Data Fig. 9f, h), and defective neural pattern- 
ing and expression changes in brain markers PAX6, EN2 and 
KROX20/EGRI1 (Fig. 5b and Extended Data Fig. 9g, h). Given the 
recently discovered link**** between CCDC22 and human syn- 
dromes of intellectual disability, malformed cerebellum and craniofa- 
cial abnormalities, the deep conservation of the Commander complex 
suggests COMMD2/3 as strong candidates in the aetiology of these 
heterogeneous disorders. 

Among metazoan-specific protein complexes, we confirmed 
physical and functional associations of spindle checkpoint protein 
BUB3 with ZNF207, a zinc-finger protein conspicuously lacking 
orthologues in cnidarians and fungi. ZNF207 binds Bub3 via a 
Gle2-binding-sequence (GLEBS) motif restricted to deuterostomes 
and protostomes (Extended Data Fig. 10a). As in human, knockdown 
of the ZNF207 orthologue in C. elegans (B0035.1) enhanced lethality 
owing to impaired Bub3-mediated checkpoint arrest (Fig. 5c). 

Among mixed complexes, we confirmed metazoan-specific 
coiled-coil domain protein CCDC97 as a sub-stoichiometric com- 
ponent of human and worm SF3B spliceosomal complex involved in 
branch-site recognition (Fig. 4a). Consistent with a possible role in 
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Figure 4 | Physical validation of complexes. a, Verification of complexes 
from tagged human cell lines and transgenic worms (see Supplementary 
Methods; complexes drawn as in Fig. 3). Inset reports spectral counts obtained 
in replicate AP/MS analyses of indicated bait protein (header). MIB2-VPS4 
complex confirmed by co-immunoprecipitation (co-IP; Extended Data Fig. 6a). 
b, Conserved complexes significantly overlap large-scale AP/MS data reported 
for human cell lines (E. L. Huttlin et al., BioGRID preprint 166968) to a 


pre-mRNA splicing, CRISPR-based CCDC97-knockout human cells 
were slower growing than were control lines (Extended Data Fig. 
10b, c) and hypersensitive to pladienolide B (Fig. 5d), a macrolide 
inhibitor of SF3b**. 
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comparable extent as literature reference sets*’’’, using three measures of 
complex-level agreement (see Supplementary Methods, Extended Data Fig. 
6b); ***P < 0.001, determined by shuffling (grey distributions). c, Agreement 
of inferred molecular weights (MW) of human protein complexes with size- 
exclusion chromatography profiles (data in ¢, d, from ref. 43). d, Co-elution of 
human Commander complex subunits by size-exclusion chromatography 
consistent with an approximately 500-kDa particle. 


Network perspective into conserved biological systems 


Knowledge of conserved macromolecular associations provides a 
road map for additional functional inferences. For instance, fractiona- 
tion profiles can be compared for any pair of proteins in our data set to 
search for evidence of interactions. We found significant enrichment 
for interactions among pairs of human proteins acting sequentially in 
annotated pathways” (Fig. 5e), especially G-protein and MAP-kinase 
cascades (Supplementary Table 8). Enzymes acting consecutively in 
core metabolic reactions (Fig. 5f) also showed a higher tendency to 
interact (Supplementary Table 8), the significance of which decayed 
with more intervening steps (Fig. 5e). For example, strong consecutive 


Figure 5 | Functional validation of complexes. a, Morpholino (MO(ATG), 
targeting start codon to block translation) knockdown of COMMD2 (n = 55 
animals, 2 clutches, 1 eye each) or COMMD3 (n = 64) in X. laevis embryos 
causes defective head and eye development (control n = 57; Extended Data 
Fig. 9f, h). ***P < 0.0001, 2-sided Mann-Whitney test. b, COMMD2/3 
knockdown animals (five embryos per treatment examined) show altered 
neural patterning, including posterior shift or loss of expression of mid-brain 
marker EN2 and KROX20 (EGR1), the latter in rhombomeres R3/R5 
(compare to Extended Data Fig. 9g, h). c, Enhanced embryonic lethality 
(epistasis) following RNAi knockdown in C. elegans of B0035.1 (ZNF207) 
and bub-3 together (eggs laid: HT115, 1,308; B0035.1, 1,096; bub-3, 445; bub- 
3 + B0035.1, 341). d, Enhanced sensitivity (mean + s.d. across four cell culture 
experiments) of two independent CCDC97-knockout lines to the SF3b 
inhibitor pladienolide B (PB) relative to control HEK293 cells. e, Enrichment 
(permutation test P value) for interactions among sequential pathway 
components and metabolic enzymes relative to shuffled controls (1 refers 

to enzyme index, where n, n + 1 denotes sequential enzymes, n, n + 2 
sequential-but-one, and so on, as described in Supplementary Information. 

f, Metabolic channelling as opposed to traditional (typical) two-step cascade 
model. g, Conserved interactions among consecutively acting enzymes 
involved in purine biosynthesis (two representative co-fractionation profiles of 
the 69 total generated are shown). 
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interactions were apparent within the widely conserved purine bio- 
synthetic pathway, with enzymes (for example, PAICS, GART) elut- 
ing in two peaks (Fig. 5g), one coincident with the prior enzyme and 
the second with the downstream enzyme, suggestive of substrate 
channelling’. 

Despite the diversity of multicellular organisms, our study reveals 
fundamental attributes of the macromolecular machinery of animal 
cells with near universal pertinence to metazoan biology, develop- 
ment and evolution. Our extremely large set of supporting biochem- 
ical fractionation data (via ProteomeXchange with identifiers 
PXD002319-PXD002328), PPIs (via BioGRID; http://thebiogrid. 
org/185267/publication/) and interaction network projections are 
fully accessible (http://metazoa.med.utoronto.ca) to facilitate in- 
depth exploration. Although we focused on global conservation 
properties, these data can be analysed at the individual animal species 
or complex levels to assess the variety and functional adaptations of 
particular protein assemblies across phyla. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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Extended Data Figure 1 | Performance measures. a, Performance 
benchmarks, measuring the precision and recall of our method and data in 
identifying known co-complex interactions from a withheld reference set 

of annotated human complexes (from CORUM”; as in Fig. 2b). Fivefold cross- 
validation against this withheld set shows strong performance gains, beyond a 
baseline achieved using only human and mouse co-fractionation data along 
with additional evidence from independent protein interaction screens””? 
and a functional gene network” (far-left curve), made by integrating 
co-fractionation data from the additional non-human animal species (as 
indicated). “All data’ and ‘Fractionation data only’ curves include biochemical 
fractionation data from all five input species: human, mouse, urchin, fly and 
worm; the latter curve omits all external data. In all cases, at least two 
species were required to show supporting biochemical evidence. Recall refers to 
the fraction of 4,528 total positive interactions derived from the withheld 
human CORUM complexes. b, All 16,655 interactions were identified at least 
in two species, half (49%, 8,121) found in three or more species. c, Among these 
high-confidence co-complex interactions, 8,981 (54%) were not reported in 
iRefWeb“ (v13.0), BioGRID* (v3.2.119) or CORUM reference 


44. Turner, B. et al. iRefWeb: interactive analysis of consolidated protein interaction 
data and their supporting evidence. Database 2010, baq023 (2010). 


(Supplementary Table 2) for any of the five input species or in yeast; half (46%, 
4,128) of these novel co-complex interactions display evidence of co- 
fractionation in three or more species. d, Final precision/recall performance on 
withheld interaction test set. A support vector machine classifier was trained 
using interactions derived from our training set of CORUM complexes, then 
~1 million protein pairs found to co-elute in at least two of the five input 
species were scored by the classifier. Black curve shows precision and recall for 
ranked list of co-eluting pairs, with recall representing the fraction recovered 
of 4,528 total positive interactions derived from the withheld set of merged 
human CORUM complexes, and precision measured using co-eluting pairs 
where both members of the pair are contained in the set of proteins represented 
in the CORUM withheld set. The top 16,655 pairs, giving a cumulative 
precision of 67.5% and recall of 23.0% on this withheld test set, form the high- 
confidence set of co-complex protein-protein interactions (blue circle). 

The highest-scoring interactions were clustered using the two-stage 
approach described in the Supplementary Methods, yielding a final set of 
7,669 interactions, which form the 981 identified complexes (red circle; 
precision = 90.0%, recall = 20.8%). 


45. Stark, C. et al. BioGRID: a general repository for interaction datasets. Nucleic Acids 
Res. 34, D535-D539 (2006). 
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Extended Data Figure 2 | Properties of protein elution profiles. 

a, Distribution of global protein tissue expression pattern similarity, measured 
as the Pearson correlation coefficient of protein abundance across 30 human 
tissues*’, showing markedly higher correlations for 16,468 protein-protein 
pairs of putative co-complex interaction partners compared to the same 
number of randomized pairs of proteins in the network which were not 
predicted to interact. b, Heat map illustrating the low to moderate cross-species 
Spearman’s rank correlation coefficients in the elution profiles observed 
between orthologous proteins during mixed-bed ion exchange 


chromatography under standardized conditions, highlighting the shift in 
absolute chromatographic retention times in different species. This variation 
indicates that the conservation of co-fractionation by putatively interacting 
proteins is not merely a trivial result stemming from fixed column-retention 
times. c, The degree of co-fractionation is measured as the correlation 
coefficient between elution profiles. Spatial proximity is calculated from the 
mean of residue pair distances between components of multisubunit 
complexes with known three-dimensional structures (see Supplementary 
Methods). 
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Extended Data Figure 3 | Derivation of complexes. a, The 2,153 proteins 
present in the 981 derived metazoan complexes participate in multiple 
assemblies (‘moonlighting’) to an extent comparable to the sharing of subunits 
reported for literature-derived complexes (CORUM). For comparison, we 
examined the 1,550 unique proteins from the full CORUM set of 1,216 human 
complexes passing our selection criteria for supporting evidence (‘Unmerged’) 
and the 1,461 unique proteins from the non-redundant set of 501 merged 
complexes used as the reference for splitting our training and testing sets, with 
some of the largest complexes removed to avoid bias in training (‘Merged’; 


see ‘Optimizing the two-stage clustering’ in Supplementary Methods for 
details). b, Schematic of 981 identified complexes containing 2,153 unique 
proteins. In this graphical representation, 7,669 co-complex interactions are 
shown as lines, and proteins as nodes. Red and green interactions were 
previously annotated in CORUM. Red interactions were used in training the 
classifier and/or clustering procedure, while green interactions were held out 
for validation purposes. Grey interactions were not previously annotated 

in CORUM. 
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Extended Data Figure 4 | Properties of new and old proteins and database) as in ref. 25. b, Annotation rates (mean count of annotation terms per 
complexes. a, The 2,153 protein components in the conserved animal protein) of old and new proteins in the derived complexes and pairwise PPIs, 
complexes tend to be more ancient than the 2,301 proteins reported in the compared with proteins in the CORUM reference complex set. Old proteins 


CORUM reference complexes or in two recent large-scale protein interaction (defined by OMA) from the complexes generally exhibited higher annotation 
assays, based on either the 7,062 proteins found by affinity purification/mass __ rates than new proteins. c, Differential enrichment of old, mixed and 
spectrometry (AP/MS; E. L. Huttlin et al., BioGRID preprint 166968, http:// metazoan-specific protein complexes for functional annotations (select GO- 
thebiogrid.org/166968/publication/) or the 3,667 proteins analysed by yeast slim biological process terms shown, top) and protein domains (Pfam, 
two-hybrid assays (Y2H)'°. Ages are derived from OMA (Orthologous Matrix —_ bottom). 
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Extended Data Figure 5 | Abundance and expression trends for proteinsin _ proteome”*, compared to less than half (46%) of the 17,294 proteins in the 
complexes. Proteins within the identified complexes tend to be ubiquitously _ overall reference set (Z-test P< 0.001). c, d, The distributions of average 


expressed across human tissues. a, b, Pie charts show the proportions of mRNA (c, data from EBI accession E-MTAB-1733) and protein (d, data from 
proteins with varying tissue expression patterns, from a recently published PaxDb integrated data set, 9606-H.sapiens_whole_organism-integrated_data 
human tissue proteome map*’, comparing the full set of 20,258 human set) abundances for all proteins identified and those within complexes. 
proteins (a) with the 2,131 proteins within the identified complexes Evolutionarily old proteins (defined by OMA as described in ref. 25 and 

(b). Consistent with these observations, 91% of the protein components inthe —_ mentioned earlier) tend towards higher abundances, even for proteins in 
complexes were expressed in >15 tissues in data from a reference human reference complexes. 


46. Uhlen, M. et al. Tissue-based map of the human proteome. Science 347, 6220 
(2015). 
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Extended Data Figure 6 | Additional validation data. a, Confirmation of 
MIB2 interactions by co-immunoprecipitation. Extract (~10 mg protein) 
from cultured human HCT116 cells expressing Flag-tagged MIB2 or control 
(WT) cells was incubated with 100 yl anti-Flag M2 resin for 4h while gently 
rotating at 4°C. After extensive washing with RIPA buffer, co-purifying 
proteins bound to the beads were eluted by the addition of 25 ,] Laemmli 
loading buffer at 95 °C. Polypeptides were separated by SDS-PAGE and 
immunoblotted using Flag, VPS4A, VPS4B or IST1 antibodies as indicated 


(expanded gel images provided in Supplementary Information). b, Protein 
co-complex interactions reported in the CYC2008 yeast protein complex 
database” are reconstructed accurately from the co-fractionation data, 
regardless of whether the full set of co-fractionation plus external data are used 
to derive protein interactions (‘All data’, see also Fig. 4b) or if the external 
yeast data was specifically excluded from the analyses (“All data, 

excluding yeast’). 
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Extended Data Figure 7 | Agreement of derived complexes’ molecular for the derived complexes. b, Derived complexes’ inferred molecular 
weights with measurement by HPLC and density centrifugation. weights are broadly consistent with their components’ average cumulative 
a, CORUM reference complexes’ inferred molecular weights (MW) are ultracentrifugation profiles on a sucrose density gradient. Average profiles are 
consistent with their components’ average cumulative size-exclusion plotted for X. laevis orthologues, based on a preparation of haemoglobin- 
chromatograms. The molecular weight of each complex was calculated as the __ depleted heart and liver proteins separated on a 7-47% sucrose density 
sum of putative component molecular weights, assuming 1:1 stoichiometry. gradient, as described in the Supplementary Methods. 


Data from ref. 43 were analysed as in Fig. 4c and show a similar trend as 
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lack Gene Ontology (GO) functional annotations, while 1,756 of 7,665 


Extended Data Figure 8 | Distribution of uncharacterized proteins and 


co-complex interactions are novel (light green) (not listed in iRefWeb curation 


database). 


novel interactions across the 981 derived complexes. Complexes were sorted 
by median age (defined by OMA). Among 2,153 unique proteins, 293 (red) 
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Extended Data Figure 9 | Properties of the Commander complex. The 
automatically derived 8 subunit Commander complex (Fig. 3b) was 
subsequently extended to 13 subunits (COMMDI1 to 10, CCDC22, CCDC93, 
and SH3GLB1) based on combined analysis of AP/MS (Fig. 4a), size-exclusion 
chromatograms” (Fig. 4d), published pairwise interactions*’*”*, and 
analysis of elution profiles of the remaining COMM-domain-containing 
proteins, as shown here. Example protein elution profiles are plotted for 
Commander complex subunits observed from: HEK293 cell nuclear extract 
(a); sea urchin embryonic (5 days post-fertilization) extract (b); and fly SL2 cell 
nuclear extract (c); each fractionated by heparin affinity chromatography. 

d, Co-expression of Commander complex subunits during embryonic 
development of X. tropicalis (plotting mean + s.d. of three clutches; data from 
ref. 49). e, Messenger RNA expression patterns of Commander complex 
subunits in stage 15 X. laevis embryos. Images show coordinated spatial 
expression in early vertebrate embryogenesis, as measured by in situ 
hybridization (three embryos examined). f, Knockdown of Commd2 induced 
marked head and eye defects in developing X. laevis. Top, Commad2 antisense 
knockdown significantly decreased eye size, shown for stage 38 tadpoles 


47. de Bie, P. et al. Characterization of COMMD protein-protein interactions in NF-«B 
signalling. Biochem. J. 398, 63-71 (2006). 

48. Phillips-Krawczak, C. A. et al. COMMD1 is linked to the WASH complex and 
regulates endosomal trafficking of the copper transporter ATP7A. Mol. Biol. Cell 26, 
91-103 (2015). 
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(from three clutches; control n = 47 animals, one eye each; ***P < 0.0001, 
two-sided Mann-Whitney test); phenotypes were consistent between 
translation blocking (MOatg; n = 60) morpholino reagents, splice site blocking 
(MOsp; n = 50) morpholinos, and knockdowns of interaction partner 
Commd3 (see Fig. 5a). Bottom, Commd2-knockdown induced altered Pax6 
patterning in the embryonic eye (control n = 8 animals, two eyes each; MO 
n= 11). g, Commd2/3-knockdown animals show altered neural patterning. 
Changes in stage 15 X. laevis embryos, measured by in situ hybridization 
(assayed in duplicates; five embryos per treatment), seen upon knockdown but 
not on controls: the forebrain marker PAX6 was expanded, while the mid-brain 
marker EN2 was strongly reduced. Notably, while expression of KROX20/ 
EGR1 in rhombomere R3 was shifted posteriorly, expression in R5 was strongly 
reduced or entirely absent. Panels in Fig. 5b are reproduced from this figure 
and are directly comparable. h, Confirmation of splice-blocking Commd2 
morpholino activity. Images and schematic show the basis and results of RT- 
PCR and agarose gel electrophoresis obtained with the corresponding X. laevis 
knockdown tadpoles. 


49. Yanai, |., Peshkin, L., Jorgensen, P. & Kirschner, M. W. Mapping gene expression in 
two Xenopus species: evolutionary constraints and developmental flexibility. Dev. 
Cell 20, 483-496 (2011). 
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Extended Data Figure 10 | Supporting data for BUB3 and CCDC97 (expanded gel images provided in Supplementary Information). c, Loss of 
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independent lines of human HEK293 cells, as verified by western blotting 
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The mechanism of DNA replication 
termination in vertebrates 


James M. Dewar', Magda Budzowska! & Johannes C. Walter? 


Eukaryotic DNA replication terminates when replisomes from adjacent replication origins converge. Termination involves 
local completion of DNA synthesis, decatenation of daughter molecules and replisome disassembly. Termination has been 
difficult to study because termination events are generally asynchronous and sequence nonspecific. To overcome these 
challenges, we paused converging replisomes with a site-specific barrier in Xenopus egg extracts. Upon removal of the 
barrier, forks underwent synchronous and site-specific termination, allowing mechanistic dissection of this process. We 
show that DNA synthesis does not slow detectably as forks approach each other, and that leading strands pass each other 
unhindered before undergoing ligation to downstream lagging strands. Dissociation of the replicative CMG helicase 
(comprising CDC45, MCM2-7 and GINS) occurs only after the final ligation step, and is not required for completion of 
DNA synthesis, strongly suggesting that converging CMGs pass one another and dissociate from double-stranded DNA. 
This termination mechanism allows rapid completion of DNA synthesis while avoiding premature replisome 


disassembly. 


DNA replication occurs in three broad stages: initiation, elongation and 
termination. Termination occurs when converging replication forks 
meet and involves at least four processes, not necessarily in the follow- 
ing order. First, the last stretch of parental DNA between forks is 
unwound (dissolution) and replisomes come into contact; second, 
any remaining gaps in the daughter strands are filled in and nascent 
strands are ligated (ligation); third, double-stranded (ds)DNA inter- 
twinings (that is, catenanes) are removed (decatenation); fourth, the 
replisome is disassembled. Despite decades of research on termination’, 
we know little about the order, mechanism and regulation of the above 
events, especially during eukaryotic chromosomal replication. 

Termination has been most extensively studied in the mammalian 
DNA tumour virus SV40 (ref. 2), where converging replication forks 
stall during termination’**. Dissolution during SV40 replication 
requires rotation of the entire fork to produce catenations behind 
the fork (pre-catenanes)**, which are resolved by topoisomerase 
(Topo) II (ref. 6), probably in a manner similar to how Topo IV 
functions during bacterial termination”®. The SV40 replicative 
helicase, large T antigen, dissociates from chromatin before dissolu- 
tion, but whether this is required for the completion of replication 
is unknown”. After dissolution, daughter strands retain gaps of 
~60 nucleotides", which are ultimately filled in by an unknown 
mechanism in parallel to decatenation”. 

Eukaryotic termination has also been investigated. Although con- 
vergent forks accumulate at certain replication pause sites in yeast 
cells lacking 5’-3' DNA helicases’**, it is unknown whether forks 
stall during unperturbed termination. Furthermore, Topo II is not 
required for dissolution in budding yeast'*'” or during vertebrate 
termination’*'’. Recent work shows that late in S phase, the eukary- 
otic replicative helicase CMG” is removed from chromatin by the 
ATPase p97 after ubiquitylation of MCM7 (by SCF”? in yeast)**°, 
While one study implied that DNA replication can go to completion 
in the absence of CMG unloading”, another reported that tracts of 
unreplicated DNA remain in the absence of this process”’. Given that 
mis-regulation of bacterial termination can readily trigger re-replication 


of DNA’*”’, a potent driver of genomic instability in mammalian cells”, 
a better understanding of eukaryotic termination is essential. 

Owing to stochastic origin firing””° and variable rates of replisome 
progression*'”’, the location and timing of eukaryotic termination is 
variable*”*’, making this process difficult to study. Here we report that 
Xenopus egg extracts can be used to induce synchronous and localized 
termination events. This approach has allowed us to identify and 
order key events underlying vertebrate termination. 


A system to study replication termination 


Our strategy was to stall forks on either side of a reversible replication 
fork barrier (Fig. 1a, panels i-iii), and subsequently disassemble the 
barrier to trigger localized and synchronous termination events 
(Fig. 1a, panel iv). The barrier that we employed consisted of an array 
of lac repressors (LacRs) bound to lac operators (JacOs)***, which can 
be disrupted by IPTG. We constructed p[lacO,¢], which contains 16 
tandem copies of lacO (490 base pairs (bp)). p[lacO,.] was incubated 
in nucleus-free Xenopus egg extract, which promotes sequence- 
nonspecific replication initiation on added DNA molecules, followed 
by a single, complete round of DNA synthesis via a mechanism that 
appears to reflect events in cells*®. To monitor replication, radioactive 
[x-**P]dATP was included in the reaction. When p[lacO,6] was repli- 
cated in the absence of LacR for ~5 min and then cut with XmnI 
(Fig. 1a, panel iii), a single linear species representing fully replicated 
daughter molecules was observed (Fig. 1c, lane 1). In contrast, in the 
presence of LacR, a slow-mobility product appeared (Fig. 1c, lane 4) 
that corresponds to a double-Y structure, as shown by 2D gel electro- 
phoresis (Extended Data Fig. 1a). To confirm that the double-Y 
resulted from fork stalling at the outer edges of the array, we separately 
monitored replication in the plasmid backbone and in the JacO array. 
In the presence of LacR, synthesis of the array was specifically delayed 
(Extended Data Fig. 1f). In contrast, LacR had no effect on replication 
of a plasmid lacking lacO sites (Extended Data Fig. le). These results 
indicate that replication forks stalled on both sides of the LacR array, 
consistent with previous findings****””. 


1Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts 02115, USA. @Howard Hughes Medical Institute, Department of Biological Chemistry 


and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts 02115, USA. 
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We next addressed whether replication forks stalled by LacR could 
restart. When IPTG was added to double-Y structures 5 min after 
replication initiation, 90% were converted to unit-sized linear plasmid 
molecules within a further 1.5 min (Fig. 1c, lanes 5-10 and Fig. 1h, 
yellow circles). In the absence of IPTG, only 21% of double-Y mole- 
cules disappeared after 3 min (Fig. lc, lane 18). The conversion of 
double-Y molecules to linear species occurs when any remaining 
parental DNA holding daughter molecules together is unwound 
(Fig. 1b). This process, which we refer to as ‘dissolution’, represents 
a convenient means to measure the point at which converging repli- 
somes meet. Notably, the ATR-Chk1 pathway was not activated 
above background levels during this procedure (data not shown). 

After dissolution, nascent strands should undergo ligation. To 
detect the growth and ligation of nascent strands, we digested 
p[lacO,6] with AlwNI, which cuts the plasmid once, ~550 nucleotides 
(nt) from the rightward edge of the array and ~2,000 nt from its 
leftward edge (Fig. la, panel iii, and Fig. 1d), and we analysed the 
products on a denaturing gel. Before IPTG addition, discrete species 
of ~2,000 nt (Fig. le, lane 4) and ~550 nt (Extended Data Fig. 2a, 
lane 4) were observed. Upon IPTG addition, both bands grew 
heterogeneously (Fig. le and Extended Data Fig. 2a). Since all leading 
strands were immediately extended upon IPTG addition (Extended 
Data Fig. 2b, c), we infer that the heterogeneity observed resulted 
because growth of the lagging strand was delayed until ligation 
of an additional Okazaki fragment. Finally, the nascent strands 
increased abruptly to the full length of 3,100 nt as ligation to down- 
stream lagging strands occurred (Fig. le, lanes 9-13). As expected, 
dissolution preceded ligation, and there was an ~45 s delay between 
these two events (Fig. 1h). 

Another important event associated with termination is decatena- 
tion of daughter molecules'*. To measure this process, we analysed 
undigested replication products on native agarose gels (Fig. lf, g). 
Before addition of IPTG, when the array had not yet been 
duplicated, replication products migrated as a compact smear of 
high-molecular-weight @ structures (Fig. 1f and Fig. 1g, lane 4). 
Upon addition of IPTG, most @ structures were lost within 1 min, 
and they were successively converted into three types of dimeric cate- 
nanes described previously*'*”*: nicked-nicked, nicked-supercoiled, 
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strands; sc—sc, supercoiled—supercoiled; kb, 
kilobase ladder, with the size of each band (in 
kilobases) labelled. h, Multiple dissolution, ligation 
and decatenation assays were quantified. 

Means = standard deviation (s.d.) are plotted 
(n=4). 


and supercoiled-supercoiled (Fig. 1g and Extended Data Fig. 3a). 
Nicked-nicked catenanes appeared first (Fig. 1g, lanes 7, 8), followed 
by nicked-supercoiled (lanes 8-10) and supercoiled—supercoiled 
(Fig. 1g, lanes 9-12). Supercoiling is the result of nucleosome assem- 
bly on closed circular DNA”. Finally, monomeric, supercoiled daugh- 
ter molecules accumulated (sc, Fig. 1g, lane 17) dependent on Topo II 
(Extended Data Fig. 3b-d) as seen in vivo'®'’. Topo II was not 
required for dissolution or ligation (Extended Data Fig. 3c, d), sug- 
gesting that these processes proceed independently of decatena- 
tion’®!”"°, Like ligation, decatenation began ~40s after dissolution, 
but progressed at a slower rate than ligation (Fig. 1h). The same 
intermediates were detected in the absence of LacR, but their order 
of appearance was not well defined (Extended Data Fig. 3e). 

Our results demonstrate that a reversible replication fork barrier 
allows induction of a synchronous and spatially defined termination 
event. They also show that soon after forks meet, as measured by 
dissolution, daughter molecules are quickly ligated and decatenated. 


Converging replication forks do not stall 


To test the proposal that replication forks slow down or stall during 
termination’*”, we quantified the rate of DNA synthesis as two repli- 
somes converged within the JacO array. To minimize the loss of 
synchrony among replisomes after IPTG addition, we used a 365-bp 
array containing only 12 copies of lacO, which was sufficient to prevent 
dissolution at the 5 min time point (Extended Data Fig. 4c). We repli- 
cated p[lacO,2] in the presence of LacR, added IPTG after 5 min, and 
examined subsequent replication within the array by cutting the plas- 
mid with AflIII and Pvull (Fig. 2a). The rate of DNA synthesis within 
the array was almost perfectly linear after IPTG addition (Fig. 2b, c) 
even as dissolution was underway. These data suggest that converging 
forks do not slow significantly before they meet. A similar conclusion 
was reached when radiolabelled nucleotides were added at the same 
time as IPTG and incorporation measured only during the final stage of 
replication on p[lacO,2] (Extended Data Fig. 5a-f) or p[lacO,.] 
(Extended Data Fig. 5g, h). Moreover, fork rates within the lacO array 
resembled those previously reported in the same egg extracts 
(Extended Data Fig. 5f). These results suggest that converging repli- 
somes do not undergo prolonged stalling. 
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Figure 2 | DNA synthesis does not stall during termination. a, Cartoon 
depicting the assay for lacO array synthesis. b, LacR block-IPTG release was 
performed on p[lacO,]. To measure synthesis within the array, termination 
intermediates were cut with AflIII and Pvull to liberate the array fragment from 
the vector. Cleaved products were separated by native gel electrophoresis. 
Different exposures of array and vector fragments are shown (see Methods). 
c, Array synthesis, vector synthesis and dissolution were quantified. 

Means = s.d. are plotted (” = 3). kb, kilobase ladder, with the size of each band 
(in kilobases) labelled. 


To evaluate further whether forks slow or stall upon encounter with 
a converging fork, we compared progression of leading strands into 
arrays containing 12 or 32 copies of lacO (Fig. 3a), in which the 
rightward fork should collide with a converging fork at the 6th or 
16th JacO repeats, respectively (Fig. 3a). If converging forks interfere 
with each other, the rightward leading strand should pause or stall 
near the 6th repeat in p[lacO,] but not in p[lacO32]. As expected, 
dissolution (Fig. 1b) happened much earlier on p[/acO,2] than on 
p[lacO3,] (Extended Data Fig. 6a-d). To monitor leading-strand pro- 
gression into the array with near-nucleotide resolution, DNA inter- 
mediates were purified, digested with the nicking enzyme Nt.BspQI, 
which released rightward leading strands (Fig. 3a), and separated on a 
denaturing polyacrylamide gel (Fig. 3b). Before IPTG addition, a 
discrete ladder of leading strands was seen (Fig. 3b, lanes 2, 14), in 
which the 3’ ends of leading strands stalled ~29-33 nt from each 
LacR molecule in the array. This ~30 nt gap probably corresponds 
to the footprint of the CMG complex***’. As shown in Fig. 3b (red 
lines) and quantified in Extended Data Fig. 6e, 78% of leading strands 
were stalled at the first three JacO sites, indicating that most repli- 
somes were blocked at the outer edges of the array. 

Upon addition of IPTG, extension of leading strands resumed 
immediately (Fig. 3b, lanes 3-11 and 15-23). Notably, there was no 
enhanced pausing near the 6th JacO repeat of the JacO, array versus 
the lacO3, array. By 5.67 min, most leading strands had extended beyond 
the 6th lacO repeat within both arrays (Fig. 3b, lanes 6 and 18, and 
Extended Data Fig. 2c). This was also true for the leftward leading strands 
(Extended Data Fig. 6f, g). Furthermore, leading strands were extended 
beyond the 6th JacO repeat in the JacO;2 and lacO3, arrays with similar 
kinetics (Fig. 3c, d). When leading strands were analysed on alkaline 
denaturing gels, we observed that all rightward and leftward leading 
strands passed the mid-point of the array by 6.25 min (Extended Data 
Fig. 2d, e), indicating that the converging leading strands were readily 
extended past each other. In summary, we failed to observe detectable 
slowing or pausing of DNA synthesis during termination, and conver- 
ging leading strands passed each other unhindered, implying that con- 
verging replisomes do not pause or stall significantly. 
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Figure 3 | Leading strands pass each other unhindered during termination. 
a, Schematic of rightward leading strands arrested at 12 and 32X lacO arrays 
(lacO,2 and lacO3,, respectively), and the predicted point of fork collision 
upon IPTG addition. b, LacR block-IPTG release was performed on p[lacO,] 
and p[lacO3,]. Termination intermediates were digested with Nt.BspQI. 
Nascent strands were separated alongside a sequencing ladder (generated by 
primer JDO107, green arrow in a) on a denaturing polyacrylamide gel and 
visualized by autoradiography. The JacO sites of p[JacO),] are indicated in blue. 
Red, yellow and grey lines indicate stall products that were quantified 
(Extended Data Fig. 6e). c, Leading strands whose 3’ ends were located before 
lacO7 were quantified (see Methods) along with dissolution (Extended Data 
Fig. 6a—d). d, Experimental repeat of c. 


Lagging strand gaps are rapidly filled in 

During SV4O0 replication termination, gaps of ~60 nt persist after 
dissolution”. To determine whether the appearance of such gaps 
precedes the ligation step in our system, we mapped the 3’ ends of 
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the leftward leading strands and the 5’ ends of the rightward lagging 
strands during termination within lacO,, (Fig. 4a). To this end, we 
digested DNA intermediates with Nb.BbvCI or Nb.BtsI to release 
leading or lagging strands, respectively (Fig. 4a), and separated 
them on denaturing polyacrylamide gels. After IPTG addition, we 
detected a prominent leading-strand product beyond the 12th lacO 
repeat (species 274 in Fig. 4b; the 3’ and 5’ termini of all leading and 
lagging strand products, respectively, are mapped relative to the 
Nb.BtsI site) as seen also in Fig. 3b. The 3’ end of this species was 
located ~3 nt from the 5’ end of the most abundant lagging strand 
product of the converging fork (271, Fig. 4c). We observed many 
other, less prominent, leading-strand products (181-420, Fig. 4b), 
most of which mapped close to corresponding lagging strand pro- 
ducts (176-417, Fig. 4c). The results show that leading strands 
are generally extended to within ~3nt of the lagging strands 
(Fig. 4d). It is likely that leading strands immediately abut lagging 
strands and that the ~3 nt gap reflects imprecise mapping of lagging 
strands (see Methods). In conclusion, we observed no evidence of 
persistent gaps between leading and lagging strands during replica- 
tion termination. 
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Figure 4 | Leading strands abut lagging strands of the opposing replisome 
during termination. a, Cartoon illustrating the leading and lagging strands 
released by Nb.BtsI and Nb.BbvCI nicking enzymes. Primers JDO111 (purple 
arrow) and JDO110 (pink arrow) generated the sequencing ladders in b and 
c, respectively. b, LacR block-IPTG release was performed on p[lacO,]. 
Termination intermediates were digested with Nb.BbvCI to liberate leftward 
leading strands, which were separated alongside a sequencing ladder on a 
denaturing polyacrylamide gel and visualized by autoradiography. Prominent 
leading strand products are highlighted (green symbols), and their sizes, in 
nucleotides, measured relative to the Nb.BtslI site, are indicated. c, Same 
samples as in b were digested with Nb.BtsI to liberate rightward lagging strands. 
The size of prominent lagging strand products (orange symbols), measured 
relative to the Nb.BtsI site, are indicated. d, Schematic of the mapped 

leading (b) and lagging (c) strands. 
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CMGs dissociate late during termination 


To determine when replisome components dissociate during ter- 
mination, we monitored MCM7, CDC45, Pole and RPA binding to 
a site flanking the JacO array using chromatin immunoprecipitation 
(ChIP) (FLK2 locus, Extended Data Fig. 7a). In parallel, we monitored 
dissolution, ligation and decatenation. Before IPTG addition, MCM7, 
CDC45, Pole and RPA were 4-8-fold enriched at the array in the 
presence of LacR compared to buffer (Extended Data Fig. 7b-e, 
5 min time point), demonstrating that the ChIP signal reflects repli- 
some stalling at the array. When IPTG was added at 5 min, MCM7, 
CDC45, RPA and Pole largely dissociated by 9 min, whereas in the 
absence of IPTG, they dissociated much more slowly (Extended Data 
Fig. 7b-e). RPA dissociation correlated well with ligation, as expected, 
since ligation marks the disappearance of any single-stranded 
(ss)DNA in the termination zone (Fig. 5a, compare red squares and 
blue circles). Notably, CDC45, MCM7 and Pole dissociated ~1.5 min 
after dissolution and ~0.5 min after RPA dissociation and ligation 
(Fig. 5a). A time course of ChIP at sequences adjacent to and within 
the array (Extended Data Fig. 7f-i) was consistent with MCM7, 
CDC45 and DNA Pole moving into the array and then back out after 
dissolution (Extended Data Fig. 7j). MCM7 and CDC45 also disso- 
ciated after dissolution during replication of plasmid DNA that lacked 
a lacO array (p[empty], Extended Data Fig. 8a, b). Although the delay 
between ligation and unloading of MCM7 and CDC45 was not readily 
detectable on this template (Extended Data Fig. 8b), this was not 
surprising, given the asynchrony of termination in this setting. 
Together, the data support a model in which CDC45 and MCM7 
dissociate late in termination, long after forks meet (dissolution) 
and shortly after ligation. 

If our model is correct, inhibiting CMG unloading should not affect 
dissolution or ligation. To test this, we inhibited ubiquitin signalling, 
which is required for chromatin dissociation of CMG**”>*". p[empty] 
was replicated in extracts that were incubated with vehicle or the de- 
ubiquitylating enzyme inhibitor ubiquitin-vinyl-sulfone (Ub-VS), 
which leads to the depletion of free ubiquitin**“’, and we performed 
MCM7 and CDC45 ChIP. As shown in Fig. 5b, c, Ub-VS substantially 
delayed MCM7 and CDC45 dissociation, and this effect was partially 
reversed by co-addition of free ubiquitin (Fig. 5b, c). The same inhib- 
itory effect of Ub- VS on CMG unloading was observed when plasmids 
were recovered from egg extract and blotted for MCM7 and CDC45 
(Extended Data Fig. 8c). This analysis also confirmed previous 
reports*** of MCM7 ubiquitylation during replication (Extended 
Data Fig. 8c). Importantly, dissolution, ligation and decatenation were 
not affected by Ub-VS (Fig. 5d, e and Extended Data Fig. 8f-i). We 
conclude that defective CMG unloading does not affect dissolution, 
ligation, or decatenation, strongly supporting our model that CMG 
unloading is a late event in replication termination. 


Discussion 


We present a novel approach to induce synchronous and site-specific 
replication termination. Using this system, we observe no slowing or 
pausing of DNA synthesis as forks converge (Fig. 5f, panels i, ii). 
Leading strands pass each other unhindered and immediately abut 
downstream lagging strands before undergoing ligation (Fig. 5f, 
panels iii-v). CMG remains associated with DNA after dissolution, 
and it is unloaded only after the leading strand of one fork is ligated to 
the lagging strand of the opposing fork (Fig. 5f, panel vii). Catenane 
removal is initiated at the same time as ligation (Fig. 5f, panels v, vi). In 
contrast to models of termination in which replication forks stall’**, 
our data imply that topological stress between replisomes is handled 
efficiently and that converging replisomes do not clash or that if they 
do, any remaining template DNA is immediately reeled into the 
stalled replisome for duplication (not shown). We previously showed 
that CMGs encircle the leading strand template at the replication 
fork”. Therefore, converging CMGs approach each other on opposite 
strands**’, which helps explain how they could pass each other. If a 
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Figure 5 | CMGs dissociate after dissolution 
and ligation. a, LacR block-IPTG release was 
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fork stalls (for example, at the ribosomal DNA locus’**), the same 
termination mechanism could still operate provided that the stalled 
fork remains stable until a converging fork arrives. We expect this to 
be the case, given our recent observation that a single fork stalled at a 
DNA interstrand cross-link does not collapse or lose its CMG com- 
plex**. We speculate that at telomeres the replisome simply runs off 
the chromosome end. 

Our observations that CMG dissociates after the ligation step 
(Fig. 5a), and that ligation is not affected when CMG unloading is 
impaired (Fig. 5b, c, e), strongly imply that CMG is unloaded from 
dsDNA. We propose that when CMG reaches the 5’ end of the 
opposing fork’s lagging strand, it passes over the ssDNA-dsDNA 
junction and keeps moving along dsDNA (Fig. 5f), as previously 
observed for purified MCM2-7 and CMG in vitro (see refs 23, 44 
but see also ref. 22). This scenario is appealing, as it would prevent 
CMG from interfering with ligation of the nascent strands. We pro- 
pose that CMG ubiquitylation and its removal by p97 (refs 24, 25) is 
triggered once CMG encircles dsDNA. Such a mechanism would help 
to avoid inappropriate CMG unloading from active replication forks, 
where CMG encircles ssDNA. Our results disagree with a recent 
report, which concluded that inhibition of CMG unloading prevents 
completion of DNA synthesis”. In contrast, another report that 
defective CMG unloading does not prevent cell cycle progression™* 
is consistent with our model. We recently reported that CMG can be 
unloaded from ssDNA when two replisomes collide with a DNA 
interstrand cross-link*’. However, this process involves a unique, 
BRCA1-dependent pathway that is not employed during termina- 
tion*?. In conclusion, the termination mechanism described here 
allows rapid completion of DNA synthesis while minimizing the pos- 
sibility of premature replisome disassembly. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. 

Protein purification. Biotinylated LacR was purified using a protocol adapted 
from Kenneth Marian’s laboratory (personal communication). The LacR open 
reading frame was fused to a C-terminal AviTag (Avidity, Denver, CO) and 
expressed from pET1la (pET1la[LacR-Avi]). To biotinylate the AviTag on 
LacR-Avi, biotin ligase was co-expressed from pBirAcm (Avidity, Denver, CO). 
To this end, pET1la[LacR-Avi] and pBirAcm were co-transformed into T7 
Express cells (New England Biolabs) and grown in the presence of ampicillin 
(100 pg ml‘) and chloramphenicol (17 ug ml‘). Expression of LacR-Avi and 
the biotin ligase was induced by addition of IPTG to a final concentration of 
1mM. Cultures were supplemented with 501M biotin (Research Organics, 
Cleveland, OH) to ensure efficient biotinylation of LacR-Avi. 

Cell pellets were resuspended in lysis buffer (50 mM Tris-HCl, pH 7.5, 5mM 
EDTA, 100 mM NaCl, 1 mM DTT, 10% sucrose (w/v), Complete protease inhib- 
itor (Roche, Nutley, NJ)). The cells were lysed at room temperature in the pres- 
ence of 0.2mgml~' lysozyme and 0.1% Brij 58. The insoluble, chromatin- 
containing fraction was isolated by centrifugation at 4°C. Chromatin-bound 
LacR was then released by sonication (in 50mM Tris-HCl, pH 7.5, 5mM 
EDTA, 1M NaCl, 1mM DTT, Complete protease inhibitor, 30 mM IPTG). 
DNA was removed from the soluble fraction by addition of polymin P (final 
concentration 1%), LacR was precipitated by addition of ammonium sulfate (final 
concentration 37%). The precipitate was dissolved in wash buffer (50 mM Tris- 
HCl, pH 7.5, 1 mM EDTA, 2.5 M NaCl, 1 mM DTT, Complete protease inhibitor) 
and then applied to a column of SoftLink avidin resin (Promega, Madison, WI). 
LacR was eluted (in 50 mM Tris-HCl, pH 7.5, 1 mM EDTA, 100 mM NaCl, 1 mM 
DTT, 5mM biotin) and dialysed overnight (against 50 mM Tris-HCl, pH 7.5, 
1mM EDTA, 150 mM NaCl, 1 mM DTT, 38% glycerol (v/v)). Purified LacR was 
frozen in liquid nitrogen and stored at —80°C. A more detailed purification 
protocol is available on request. 

Cyclin A was purified as described previously’. 

Plasmid construction and preparation. pJD82 (Extended Data Table 1) was 
created by replacing the SacI-KpnI fragment of pBlueScript II KS- with the 
sequence: GAGCTCTCACACCTACAAGGGATGTACATCAATTGTGAGCG 
GATAACAATTGTTAGGGAGGAATTGTGAGCGGATAACAATTTGGAGT 
TGATAATTGTGAGCGGATAACAATTGGCTTCAACGTAATTGTGAGCGG 
ATAACAATTTCCGTACGAATGTGCCGAACTTATGGTACC. This contains 
four tandem repeats of the lac operator sequence (AATTGTGAGCGGATAA 
CAATT) interspersed by an average of 10-11 bp of random sequence (average 
10.33 bp). Additional tandem repeats of the BsiWI-BsrGI fragment were then 
cloned into pJD82, and subsequently derived vectors, to generate arrays of 8, 12, 
16, 32 and 48 lacO repeats (Extended Data Table 1). Recognition sites for nicking 
enzymes were introduced by QuickChange mutagenesis (Agilent Technologies, 
Santa Clara, CA) according to the manufacturer’s guidelines. 

To propagate lacO plasmid DNA, plasmids were transformed into DH5« cells 
and grown for a minimal number of passages in the presence of 2mM IPTG. 
DNA was prepared using the QIAprep spin kit (Qiagen, Valencia, CA). To 
eliminate preparations containing genetic rearrangements (typically ~25%), 
each preparation was separated by electrophoresis on a 0.8% agarose gel and 
visualized by ethidium bromide staining. Preparations that were free of rear- 
ranged plasmids were then verified by sequencing (Genewiz, Cambridge, MA). 
Xenopus egg extracts and DNA replication. Xenopus egg extracts were prepared 
from Xenopus laevis wild-type males and females 2-5 years of age, as approved by 
the Harvard Medical School Institutional Animal Care and use Committee 
(IACUC) and as described previously*®. For DNA replication, 1 volume of ‘licens- 
ing mix’ was prepared by adding plasmid DNA to High Speed Supernatant (HSS) 
of egg cytoplasm to a final concentration of 7.5-15 ng pl’. Licensing mix was 
incubated for 30 min at room temperature, leading to the formation of pre- 
replication complexes (pre-RCs). Next, licensing mix was supplemented with 
0.1 volumes of cyclin A to a final volume of 576 nM and incubated a further 
10 min at room temperature, as previously described’. Cyclin A treatment was 
performed to achieve highly synchronous DNA replication (Extended Data 
Fig. 9). Finally, 1.9 volumes of nucleoplasmic extract (NPE) was added to initiate 
Cdk2-dependent replication at pre-RCs. In all figures, ‘0 minutes’ represents the 
time 30 s after NPE addition. To radiolabel DNA, NPE was supplemented with 
[a-*’P]dATP. Reactions were stopped with 10 volumes Stop Solution (0.5% SDS, 
25 mM EDTA, 50 mM Tris-HCl pH 7.5). DNA in Stop Solution was treated with 
RNase A (190 ng ul”! final concentration) then Proteinase K (909 ng ul? final 
concentration) before either direct analysis by gel electrophoresis or purification 
of DNA as described previously*°. For Ub-VS experiments, Ub-VS (Boston 
Biochem) was added to final concentration of 20 11M, to HSS 5 min before addi- 
tion of plasmid DNA (HSS) and to NPE 5 min before addition of HSS, with or 
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without 12011M ubiquitin (Boston Biochem). Unless otherwise stated in the 
figure legend, all experiments were performed at least twice and a representative 
result is shown. Replicate samples were collected from independently assembled 
replication reactions, and therefore represent biological replicates. 
Immunodepletions. To deplete Topo II-« from Xenopus egg extracts one volume 
of Protein A Sepharose Fast Flow (PAS) (GE Healthcare) was incubated with 
4.5 volumes of affinity purified, anti-Topo II-« antibody raised against the 
C-terminal 20 residues (1 mg ml~'). For mock depletion, an equivalent quantity 
of nonspecific IgGs was used. Five volumes of pre-cleared HSS or NPE was then 
mixed with one volume of the antibody-bound sepharose and incubated for 
45 min at 4 °C, and for the NPE this was repeated once. Depleted extracts were 
collected and used immediately for DNA replication. 

Induction of termination. To monitor termination, 0.05 volumes of plasmid 
DNA (150-300 ng ul?) was incubated with 0.1 volumes LacR (54 1M) or dialysis 
buffer for at least 90 min at room temperature to allow formation of LacR arrays 
on the DNA. Licensing mix was prepared by adding 0.85 volumes of HSS, and 
DNA was replicated as described above. To induce termination, 0.06 volumes of 
IPTG was added (to a final concentration of 10 mM) at the time indicated (typ- 
ically 5 min), which triggered dissociation of lacO-bound LacR. To accurately 
withdraw samples at the times indicated, reactions composed of the same 
Licensing Mix and NPE were staggered, where necessary. 

2D gel electrophoresis. 2D gels were performed as described"". Briefly, purified 
DNA was digested with XmnI (New England BioLabs) and then separated by 
native-native 2D gel electrophoresis. Samples were separated in the first dimen- 
sion on a 0.4% agarose gel at 0.75 volts (V) cm’ ' for approximately 40 h at room 
temperature. The gel was stained with 0.3 pg ml’ ethidium bromide, allowing 
the 2-8 kb size range to be excised. A second dimension gel containing 1% agarose 
and 0.31gml~’ ethidium bromide was cast over the gel slice from the first 
dimension. DNA was separated on the second dimension at 4.5Vcm ! for 
12hat4°C. 

Termination assays. To monitor dissolution, 0.25-1.0 ng pl of purified DNA 
was incubated in CutSmart Buffer with 0.4 units l~' of XmnI (New England 
BioLabs) at 37 °C for 1 h. Digested products were separated on a 1.2% agarose gel 
at 4Vcm | and detected by autoradiography. Dissolution (%) was calculated as 
the percentage of total signal in each lane present in the linear products of 
digestion (Lins, Fig. 1c). 

To monitor ligation, 0.25-1.0ngul | of purified DNA was incubated in 
CutSmart buffer with 0.2 units pl”! of AlwNI (New England BioLabs) at 37°C 
for 1h. Digests were terminated by addition of EDTA to 30 mM, then products 
were separated on a 1.5% denaturing alkaline agarose gel at 1.5V.cm ‘ and 
detected by autoradiography. The percentage of total signal in each lane present 
in the full-length strands was measured (FLS, Fig. le). During electrophoresis, 
partial hydrolysis caused signal from the FLS to smear down. To correct for this, a 
fully ligated plasmid was cleaved and analysed on the same gel. The percentage of 
signal in FLS band of the fully ligated plasmid was measured (FLS"") and used to 
correct signal in the other lanes to yield an accurate measure of ligation. Ligation 
(%) was calculated as FLS/FLS™ x 100, 

To monitor decatenation, 0.25-1.0. ng ull” of purified DNA was separated ona 
0.8% agarose gel at 4 V.cm__' and detected by autoradiography. Decatenation (%) 
was measured as the percentage of total signal in each lane present in circular 
monomers (CMs, Fig. 1g). 

To monitor DNA synthesis within a lacO array (Fig. 2), 0.25-1.0ngul * 
of purified DNA was incubated in buffer 3.1 with 0.2 unitspl~’ Pvull and 
0.2 units pl~’ AflIII (New England BioLabs) at 37 °C for 1h. Digested products 
were separated on a 1.2% agarose gel at 4Vcm ' and detected by autoradio- 
graphy. To measure array synthesis (SYN“*”), the 0.5-1.5-kb region of each lane 
was quantified (lins and DYs, Fig. 2b). To measure vector synthesis (SYNY®%), the 
2-6 kb region of each lane was quantified, which included the ~3.0 and ~6.0 
bands that arose when one, or both, lagging strands did not cut, respectively. Total 
signal in each lane (SYN?°') was also measured. To correct for differences in 
efficiency of DNA extraction, total lane signal was also measured in a set of 
unprocessed samples (SYNU), which were separated and detected in parallel. 
Array synthesis (%) was calculated as SYNUN/SYN'°! x SYNA®Y, vector syn- 
thesis was calculated as SYNUN/SYN‘°! x SYNY® and in both cases the 10 min 
time point was assigned a value of 100%. The same approach was also used to 
quantify synthesis of the 294/794 bp fragments (quantified in the same manner as 
the array) and the 2,354 bp fragments (quantified in the same manner as the 
vector fragments) in Extended Data Fig. 1. In Fig. 2 and Extended Data Fig. 1, a 
longer exposure of the array fragment is shown because it is less intense than the 
vector fragment. 

To analyse topoisomers (Extended Data Fig. 3d), 0.25 ng pl of radiolabelled 
DNA was incubated in 1X buffer A and 1X buffer B (Topogen) with 0.2 U ul? 
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Human Topo II-« (Topogen) at 37°C for 15 min, or in CutSmart buffer with 
0.4U pl? Xmnl or 0.04 U pl? Nt.BspQI (New England Biolabs) for 1h. 
Nascent strand analysis. To nick rightward leading strands, 1-2 ng pl‘ of puri- 
fied DNA was incubated in buffer 3.1 with 0.4 units pl’ Nt.BspQI (New England 
BioLabs) at 37 °C for 1h. To nick leftward leading strands, 1-2 ng ul of purified 
DNA was incubated in CutSmart buffer with 0.04 units pl~’ Nb.BsrDI (New 
England BioLabs) at 65 °C for 1h. To nick rightward leading strands closer to 
the lacO array, 1-2 ng pl of purified DNA was incubated in CutSmart buffer 
with 0.04 units ul” ’ Nb.BbvCI (New England BioLabs) at 37°C for 1h. To nick 
leftward lagging strands, 1-2 ng pil of purified DNA was incubated in buffer 3.1 
with 0.04 units pl~? Nb.BtsI (New England BioLabs) at 37°C for 1h. In all 
cases, nicking reactions were stopped by the addition of 0.5 volumes of Stop 
solution B (95% formamide, 20 mM EDTA, 0.05% bromophenol blue, 0.05% 
xylene cyanol FF). 

Nicked DNA (1.5-2 p11 sample) was separated on a 42-cm-long 4% or 5% 
polyacrylamide sequencing gel using Model S2 sequencing gel apparatus 
(Apogee Electrophoresis, Baltimore, MD) according to the manufacturer’s guide- 
lines. To maximize the range of nascent products that could be resolved, gels 
were cast with a thickness gradient of 0.4 to 1.2 mm, beginning to end, to establish 
an electrical field gradient during electrophoresis. Sequencing gels were 
prepared with Rapidgel-XL in 0.8X GTG buffer (USB Corporation, Cleveland). 
Sequencing ladders were generated using the Cycle Sequencing Kit (USB 
Corporation, Cleveland) with primers JDO107, JDO109, JDO110, JDO111 
(Extended Data Table 1) and pJD150 (Extended Data Table 1) as template DNA. 

Mapping and quantification of the nascent strands in Figs 3 and 4 was per- 

formed as follows. Nascent leading and lagging strands were mapped using the 
sequencing ladders generated by the primers indicated in Fig. 3a and Fig. 4a (see 
Extended Data Table 1 for sequences). Slight discrepancies may exist between 
mapped and actual lagging strand product sizes (Fig. 4c) since the sequencing 
ladder (generated by JDO110 Fig. 4a) is complementary to the lagging strands. A 
fraction of lagging strand products 176-302 were not extended upon IPTG addi- 
tion, probably because they were reached by the rightward leading strand first. 
Lagging strand products 312-417 appeared de novo after IPTG addition, and 
therefore represent growing lagging strands of the leftward fork. To quantify 
leading strand progression (Fig. 3c, d), leading strands whose 3’ ends were located 
before lacO7 (in Fig. 3b and data not shown) were quantified, and peak signal was 
assigned a value of 100 (%Max). 
ChIP and quantitative PCR. ChIP and quantitative PCR (qPCR) were per- 
formed essentially as described’. Chromatin was withdrawn and crosslinked in 
the presence of 1% formaldehyde for 10 min at room temperature. Crosslinking 
was then quenched by the addition of 0.1 volumes glycine (1.25 M) for 10 min. 
Samples were then spun through Bio-Spin P-6 Gel (containing Tris Buffer, Bio- 
Rad) to remove salts and small molecules, before being stored in 10 volumes of 
sonication buffer (20 mM Tris pH 7.5, 150 mM NaCl, 2mM EDTA, 1% IGEPAL 
CA-630 (v/v), 2mM PMSF, 5 pg pl aprotinin, 5 jg pl’ leupeptin). Samples 
were then sonicated to shear chromatin into approximately 250 bp fragments. 

The antibodies used were described previously”. Antibodies were incubated 
with chromatin overnight at 4°C, then immunoprecipitated by addition of 
Protein A-Sepharose Fast Flow beads (GE Healthcare) for 2h at room temper- 
ature. Beads were washed sequentially with sonication buffer, high salt buffer 
(sonication buffer supplemented with 500 mM NaCl and 100mM KCl), wash 
buffer (10 mM Tris pH 7.5, 0.25 M LiCl, 1mM EDTA, 0.5% NP-40 (v/v), 0.5% 
SDS (w/v)) and TE (10 mM Tris pH 7.5, 1 mM EDTA), before being eluted into 
elution buffer (50 mM Tris pH 7.5, 10 mM EDTA, 1% SDS) at 65 °C for 20 min. 
Eluted chromatin, and input samples, were treated with RNase for 30 min at 
37 °C. Finally, proteins were degraded by addition of NaCl (250 mM final) and 
treatment with Pronase (2 1g pl’ final) at 42 °C for 6 h. DNA-peptide crosslinks 
were reversed by treatment at 70°C for a further 9h. DNA was subsequently 
phenol:chloroform extracted and ethanol precipitated. The absolute amount of 
DNA recovered from the immunoprecipitated and input samples was measured 
by quantitative PCR (qPCR) relative to a standard curve. The qPCR primers used 
are listed in Extended Data Table 1. Binding was measured as the percentage 
recovery of immunoprecipitated DNA, relative to the input (EXP**°), 

To minimize error in the ChIP process, an internal control was built into all 
experiments. Xenopus egg extracts were used to separately replicate a different 


plasmid, pQUANT (see Extended Data Table 1 for sequences). Mid-way through 
replication, PQUANT was crosslinked, quenched and spun through Bio-Spin 
P-6 gel (as above) to yield a single pool of heterologous chromatin that was 
bound by replication proteins. An equal amount of pQUANT chromatin 
was added to all experimental chromatin samples before sonication, and this 
was carried through the entire ChIP procedure. For each set of immunopreci- 
pitations, the recovery of pQUANT (QNT™°) should be identical between 
samples. To correct for technical variation in any set of immunoprecipitations, 
average pPQUANT recovery was calculated (QNT4YS) and normalized recovery 
(%) was calculated as EXPREC*QNTAYS/QNT®©, This ensured that the only 
sources of technical variation were the crosslinking process and the qPCR. To 
maximize the reliability of the qPCR, these measurements were performed 
in quintuplicate and the median value was used. Where three ChIP experi- 
ments were combined and plotted as mean + s.d. (Fig. 5a-c and Extended 
Data Figs 7f-i and 8j) it was necessary to normalize the data to correct for 
differences in absolute IP efficiency between experiments. For each protein 
measured by ChIP, mean recovery across all loci in all samples (mean*”) was 
calculated for each experiment (mean*"", mean™? and mean™) and used to 
generate a correction factor for each experiment (for example, for experiment 
1 the correction factor is [(mean*” + mean™? + mean™)/3]/mean™”). To mea- 
sure dissociation (Fig. 5a—c), recovery of the FLK2 locus was measured (shown in 
Extended Data Figs 7f-i and 8}) and peak signal was assigned a value of ‘0’, while 
background signal (measured at 4 or 5 min for Fig. 5a, or 10 min for Fig. 5b, c) 
was assigned a value of 100. The experiments shown in Fig. 5a and Extended 
Data Fig. 7f-i were repeated three times, once with p[/acOx12] and twice with 
pllacOx16]. 

Plasmid pull downs. Plasmid pull downs were performed essentially as 
described”, with the following exceptions. Beads were resuspended in buffer 
supplemented with 4% DMSO and 100 1M NMS-873** to block further CMG 
unloading once the samples were withdrawn”. Plasmid-associated proteins from 
40-80 ng of plasmid were isolated, and a quarter of the sample was analysed by 
western blotting using previously described antibodies against CDC45, MCM7 
and PCNA“. 

HIS,-Ub immunoprecipitations. Ni-NTA Superflow Resin (Qiagen) was 
washed three times with Urea buffer (10 mM imidazole, 0.2% NP-40, 8 M urea, 
500 mM NaH,PO,, 50mM Tris HCl, pH 8.0). For each immunoprecipitation, 
10 pl of resin was added per tube, and resuspended to 191 ul in Urea buffer. 
Extracts were supplemented with 100 uM of HIS,-ubiquitin (Boston Biochem) 
and replication was carried out as described above. At the indicated time, 9 tl of 
extract was mixed with the bead mix and samples were incubated for 1h at room 
temperature, with end-over-end rotation. Resin was washed three times with urea 
buffer. All residual buffer was removed, and resin was boiled for 5 min in 30 pl 
sample buffer (125 mM Tris-HCl pH 6.8, 20% glycerol, 6.1% SDS, 0.01% bromo- 
phenol blue, 10% B-mercaptoethanol). 30 pl of 0.5 M imidazole was added to each 
sample and HIS,-tagged proteins were eluted off the resin for 60 min at room 
temperature, with gentle agitation. Resin was spun down at 1,000 RCF for 1 min, 
and the supernatant was removed. 10 1l of each sample was resolved on an SDS- 
PAGE gel alongside an input control and analysed by western blotting using the 
previously described antibody against MCM7”. In Extended Data Fig. 8d, a 
longer exposure of the IP lanes is shown, since they are far less intense than the 
input lanes. 
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Extended Data Figure 1 | Sequence-specific termination can be induced at a 
LacR array. a, To investigate whether a LacR array blocks replication forks, a 
plasmid containing a tandem array of 16 lac operator (lacO) sequences, 
p[lacO;6] (or p[lacOx16]), was incubated with buffer or LacR and then 
replicated in egg extract containing [o-**P]dATP. Radiolabeled replication 
intermediates were cleaved with XmnI (far left cartoon) and separated 
according to size and shape by 2D gel electrophoresis (see schematic of 2D gel). 
As replication neared completion at 4.5 min, mainly linear molecules were 
produced in the presence of buffer (orange arrowhead). In contrast, in 

the presence of LacR, a discrete spot appeared on the double-Y arc (blue 
arrowhead), demonstrating that converging replication forks accumulate at a 
specific locus on p[lacO,.]. These data indicate that 16 copies of LacR block 
replication forks. b-f, To test whether the double-Y structures observed in panel 
a arose from replication forks stalling at the outer edges of the /acO array, we 
tested whether LacR specifically inhibited replication of lacO sequences. To this 
end, p[/acOj,] (c) and the parental plasmid lacking lacO repeats, p[empty] (b), 
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were incubated in the presence of buffer or LacR and replicated using Xenopus 
egg extracts containing [a-**P]dATP. Radiolabelled replication intermediates 
were cleaved with AflIII and Pvull to release the 2,354-bp plasmid backbone 
(b and c) and a 294-bp control fragment from p[empty] (b) or a 794-bp lacO 
fragment from p[lacO,] (c). The plasmid backbone and the respective inserts 
were separated on a native gel and detected by autoradiography (d). A 
longer exposure of the small fragments is shown, since they are less intense 
than the large fragments. The results in panel d were quantified in e and 
f. Notably, LacR specifically inhibited replication of the lacO-containing 
fragment in p[lacO,,] (f, blue circles) but not the control fragment in 
plempty] (e, green circles). We conclude that LacR prevents replication of 
the lacO array and that the double-Ys in panel a represent forks converged 
on the outer edges of the array. Importantly, synthesis within the 2,354-bp 
backbone fragment (f, orange circles) of p[/acO,.] was not inhibited in 
the presence of LacR, indicating that no global structural changes occur that 
inhibit replication. 
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Extended Data Figure 2 | Supplementary fork progression data. a, The gel 
shown in Fig. le was overexposed and shown in its entirety so that the smaller 
leftward strands (LWS, Fig. 1d) could be detected. As observed for the 
rightward strands (RWS, Fig. le), LWS rapidly increased in size and then 
disappeared as they were ligated to produce full-length strands (FLS, Fig. le). 
b-e, To determine whether the heterogeneity of LWS (a) and RWS (Fig. le) was 
due to delayed extension of lagging strands, or because a significant fraction 
of leading strands did not restart upon IPTG addition, we specifically 
monitored leading strand progression upon IPTG addition on p[/acOj.]. To 
this end, DNA samples were treated with Nt.BspQI or Nb.BsrDI to specifically 
liberate the rightward or leftward leading strands, respectively (b), and 
DNAs were separated on a denaturing agarose gel (c). Before IPTG addition, 
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discrete leading strand products of the expected size were observed (lanes 2 and 
10). The presence of two stall products reflects the fact that at a slow rate, 

the replisome bypasses LacR (see also Fig. 3). Upon IPTG addition, these 
species rapidly and completely shifted up the gel, indicating that rightward and 
leftward leading strands restarted efficiently. Therefore, the heterogeneity of 
the LWS (a) and RWS (Fig. le) is probably due to delayed ligation of a new 
Okazaki fragment to the lagging strands. Quantification of leading strands that 
had not reached the midpoint of the array (rightward and leftward strands 
smaller than 550 and 500 nt, respectively, b) revealed that by 6.25 min, 90% of 
rightward and leftward leading strands passed the midpoint of the array 

(d, e). This demonstrates that leading strands pass each other when forks meet. 
KB, kilobase ladder, with the length of each band (in kilobases) labelled. 
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Extended Data Figure 3 | Topo-II-dependent decatenation of p[lacO,.]. 

a, The autoradiograph in primary Fig. 1g is reproduced with cartoons 
indicating the structures of the replication and termination intermediates n-n, 
n-sc, sc—sc, n and sc (see Fig. 1 for definitions). The order of appearance of the 
different catenanes matches previous work’ (n-n, then n-sc, then sc-sc). 

b-d, To determine the role of Topo II during termination within a lacO array, 
termination was monitored in mock- or Topo-II-depleted extracts. To confirm 
immunodepletion of Topo II, mock and Topo-II-depleted NPE was blotted 
with MCM7 and Topo II antibodies (b). p[/acO,¢] was incubated with LacR, 
then replicated in either mock- or Topo-II-depleted egg extracts in the presence 
of [x-*“P]dATP, and termination was induced with IPTG (at 7 min). Untreated 
DNA intermediates were separated by native gel electrophoresis (c). In the 
mock-depleted extract, nicked and supercoiled monomers were readily 
produced (as in panel a, albeit with slower kinetics due to nonspecific inhibition 
of the extracts by the immunodepletion procedure), while in the Topo-II- 
depleted extracts, a discrete species was produced. DNA from the last time point 
in each reaction (lanes 4 and 8 in panel c) was purified and treated with XmnI, 
which cuts p[lacO,.] once, or Nt.BspQI, which nicks p[lacOx16] once, or 
recombinant Topo II, and then separated by native gel electrophoresis (d). 
Cleavage of the mock- and Topo-II-depleted products with XmnlI yielded 

the expected linear 3.15-kb band (lanes 2 and 6), demonstrating that 


in both extracts all products were fully dissolved topoisomers of each other. 
Relaxation of the mock-depleted products by nicking with Nt.BspQI yielded a 
discrete band corresponding to nicked plasmid (lane 3), while the Topo-II- 
depleted products were converted to a ladder of discrete topoisomers (lane 7), 
which we infer represent catenated dimers of different linking numbers, 
since the mobility difference cannot be due to differences in supercoiling. 
Importantly, the mobility shift after Nt.BspQI treatment (lane 5 versus lane 7) 
demonstrated that the Topo-II-depleted products (lane 5) were covalently 
closed and thus in the absence of Topo II, ligation of the daughter strands still 
occurred. Treatment of the mock- and Topo-II-depleted products with 
recombinant human Topo II produced the same relaxed monomeric species 
(lanes 4 and 8), further confirming that the Topo-II-depleted products 
contained catenanes. Collectively, these observations demonstrate that 
termination within a lacO array in Topo-II-depleted extracts produces highly 
catenated supercoiled-supercoiled dimers, as seen in cells lacking Topo II’*”’. 
These data confirm that Topo II is responsible for decatenation and argue that 
termination within a lacO array reflects physiological termination. e, n—n, 
n-sc, sc-sc, n and sc products were also detected when plasmid lacking 
lacO sequences (pBlueScript) was replicated in the absence of LacR without 
the use of cyclin A to synchronize replication. Therefore, these intermediates 
arise in the course of unperturbed DNA replication in Xenopus egg extracts. 
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Extended Data Figure 4 | Inhibition of termination by different-sized LacR 
arrays. a, Cartoon depicting intermediates detected in the dissolution assay. 
b, To determine the ability of different-sized LacR arrays to inhibit termination, 
the earliest stage of termination, dissolution (a), was monitored in plasmids 
containing 0, 12, 16, 32, or 48 lacO repeats. Plasmids were incubated with LacR, 
and replicated in the presence of [o-**P]dATP. To measure dissolution, 
radiolabelled termination intermediates were cut with XmnI. Cleaved products 


were separated on a native agarose gel and detected by autoradiography. 

c, Quantification of dissolution in b. When 12 or more lacO repeats were 
present in the array, dissolution was robustly inhibited for at least 5 min. Potent 
inhibition lasted 10 min when 32 lacO sequences were present, and 20 min 
in the presence of 48 JacO sequences. In the absence of lacO sequences, 


dissolution was essentially complete by 5 min. Therefore, 12 lacO repeats are 
sufficient to inhibit termination for 5 min. 
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Extended Data Figure 5 | The rate of total DNA synthesis does not slow 
before dissolution. a—c, To test further whether replication stalls or slows 
before dissolution, p[/acO,2] was pre-incubated with LacR and replicated in 
Xenopus egg extracts. Termination was then induced by addition of IPTG after 
5 min. Simultaneously, [«-**P]dATP was added to specifically radiolabel 
DNA synthesized after IPTG addition (a). Radiolabelled DNA was then 
separated on a native agarose gel and total signal was measured by 
autoradiography (b). Total signal was quantified, normalized to peak signal, 
and graphed alongside the rate of dissolution, which was also measured in the 
same experiment (c). This approach gives a highly sensitive measure of 
DNA synthesis without manipulation of DNA samples. DNA synthesis should 
occur primarily within the lacO array (see Extended Data Fig. 1). Upon 
IPTG addition, there was an approximately linear increase in signal, which 
plateaued by 5.83 min. Importantly, dissolution was 65% complete by 5.83 min. 
Therefore, the large majority of dissolution occurs without stalling of DNA 


synthesis. d, e, Experimental repeats of b, c. f, The experiments shown in 

c-e were graphed together with mean + s.d. Synthesis data were normalized so 
that for each experiment, synthesis at 1 min was assigned a value of 84.4%, 
since this was the average value from c, d, where synthesis was allowed to 
plateau. Given the rate of replication fork progression in these egg 

extracts (260 bp min! (ref. 32)) and the size of the array (365 bp), forks 
should require, on average, 0.7 min to converge if no stalling occurs 

((365 bp/2)/260 bp min” ' = 0.7 min). The time required for dissolution was 
not appreciably longer than this (dissolution was 50% complete by 0.67 min 
after IPTG addition, f), consistent with a lack of stalling. g, h, The experiment 
shown in b, c was repeated using p[/acO,¢]. Synthesis was approximately 
linear until 6.17 min, at which point 81% of molecules had dissolved, further 
demonstrating that the majority of dissolution occurs without stalling of 
DNA synthesis. 
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Extended Data Figure 6 | Replisome progression through 12 and 32 IacO 
arrays. a-d, To test whether replisomes meet later in a JacO3, array than a 
lacO, array, we monitored dissolution. LacR block-IPTG release was 
performed on p[lacO,,] and p[lacO3] and radiolabelled termination inter- 
mediates were digested with XmnI to monitor the conversion of double-Y 
molecules to linear molecules (dissolution). Cleaved molecules were separated 
on a native agarose gel, detected by autoradiography (a, c), and quantified 

(b, d). Upon IPTG addition, dissolution was delayed by at least 1 min within the 
32 lacO array compared to the 12 lacO array (b, d). Moreover, by 6 min, 92% 
of forks had undergone dissolution on p[/lacO,,] while only 9% had 

dissolved on p[lacO3,] (b, d). e, Stall products within the 12 lacO array (Fig. 3b, 
lane 2) were quantified, signal was corrected based on size differences of the 
products, and the percentage of stall products at each stall point was calculated. 
78% of leading strands stalled at the first three arrest points (red columns), 
19% stalled at the fourth to tenth arrest points (yellow columns) and the 
remaining 3% stalled at the tenth to fourteenth arrest points (grey columns). 
The appearance of fourteen arrest points is reproducible but surprising, given 
that the presence of only 12 lacO sequences was confirmed by sequencing in the 
very preparation of p[/acO,,] that was used in Fig. 3. The thirteenth and 
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fourteenth arrest points cannot stem from cryptic lacO sites beyond the 
twelfth JacO site, as this would position the first leftward leading strand stall 
product ~90 nucleotides from the JacO array, instead of the observed ~30 
nucleotides (see f, g). At present, we do not understand the origin of these stall 
products. f, g, Progression of leftward leading strands into the array. The 
same DNA samples used in Fig. 3 were digested with the nicking enzyme 
Nb.BsrDI, which released leftward leading strands (f), and separated on a 
denaturing polyacrylamide gel (g). The lacO sites of p[/acOx12] are highlighted 
in blue on the sequencing ladder (g), which was generated using the primer 
JDO109 (green arrow, f). Green circles indicate two nonspecific products of 
digestion. These products arise because nicking enzyme activity varies between 
experiments, even under the same conditions. There was no significant 
difference in the pattern of leftward leading strand progression between the 12 
lacO and 32 lacO arrays, as seen for the rightward leading strands (Fig. 3b). 
Specifically, by 5.67 min, the majority of leading strands had extended beyond 
the seventh lacO repeat within JacO,, (lane 6) and the equivalent region of 
lacOz (lane 18). Therefore, progression of leftward leading strands is 
unaffected by the presence of an opposing replisome, suggesting that 
converging replisomes do not stall when they meet. 
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Extended Data Figure 7 | Supplementary ChIP data. a, Cartoon depicting 
the LAC, FLK2 and FAR loci, which were used for ChIP. Their precise locations 
relative to the leftward edge of the lacO array are indicated. The LAC amplicon 
is present in four copies distributed across the lacO,. array and three copies 
distributed across the lacO,, array. b-e, p[lacOx12] was incubated with buffer 
or LacR and termination was induced at 5 min by IPTG addition. MCM7, RPA, 
CDC45 and Pole ChIP was performed at different time points after IPTG 
addition but also in the buffer control and no IPTG control. Recovery of FLK2 
was measured as a percentage of input DNA. Upon IPTG addition, ChIP signal 
declined and by 9 min was comparable to the buffer control, demonstrating 
that unloading of replisomes was induced within 4 min of IPTG addition. f, To 
test whether movement of the replisome into and out of the JacO array could 
be detected upon IPTG addition, termination was monitored within a lacO 
array, and we performed ChIP of the leading strand polymerase Pole, which 
was inferred to move into and out of the array based on the behaviour of leading 
strands during termination (Extended Data Fig. 2b-e). It was predicted that 
Pole ChIP at the LAC locus should increase slightly as Pole enters the lacO array 
and decline again as converging polymerases pass each other, but persist at 
FLK2 while the polymerases move out of the array. Before IPTG addition, Pols 
was enriched at LAC and FLK2 compared to FAR, consistent with the leading 
strands being positioned on either side of the lacO array (Extended Data Fig. 2c 
and Fig. 3). Upon IPTG addition, Pole became modestly enriched at LAC 
compared to FLK2 (5.5 min) but then declined to similar levels at both LAC and 
FLK2 by 6.5 min. These data are consistent with the leading strand polymerases 
entering the lacO array and passing each other. g, h, To test whether CMG 
exhibited the same ChIP profile as Pole, MCM7 and CDC45 ChIP was 
performed using the same samples. After IPTG addition, MCM7 and CDC45 
were enriched at LAC compared to FLK2 (5.5 min), then declined to similar 
levels at both LAC and FLK2 by 6.5 min, as seen for Pole (f). These data are 
consistent with a model in which CMGs enter the array and pass each other 
during termination. A caveat of these experiments is the relatively high recovery 
of the FAR locus in MCM7, CDC45 and Pole ChIP. Specifically, signal was 
at most only ~2-fold enriched at LAC compared to FAR. This was not due to 
high background binding, because by the end of the experiment (10 min 
time point, not shown) we observed a decrease in signal of ~5-7-fold. 
Furthermore, we observed ~5-7-fold enrichment in binding (ChIP) of 
replisome components to p[/acO,,] that had been incubated in LacR compared 


to a buffer control (see g-i, below). Instead, the high FAR signal was 
probably due to poor spatial resolution of the ChIP. Consistent with this, when 
a plasmid containing a DNA interstrand cross-link (ICL) was replicated, 
essentially all replisomes converged upon the ICL but the ChIP signal for 
MCM7 and CDC45 was only ~3-4-fold enriched at the ICL compared to a 
control locus*'. We speculate that the higher background observed at the 
control locus in our experiments is due to the decreased distance of the control 
locus from the experimental locus (1.3 kb for p[lacO,.] and p[lacO,2] versus 
2.4kb for the ICL plasmid) and possibly due to increased catenation of the 
parental strands during termination. The high signal at FAR should not 
complicate interpretation of the MCM7, CDC45 and Pole ChIP (f), as signal at 
FAR was essentially unaltered between 5 and 6.5 min. Further evidence that the 
high signal seen at the FAR locus emanates from forks stalled near the lacO 
array is presented in panel k. i, ChIP of RPA was performed on the same 
chromatin samples used in b-d. As seen for Pole, MCM7 and CDC45, 
enrichment of RPA at LAC compared to FAR was relatively low, consistent with 
poor spatial resolution. j, Predicted binding of CMGs to the LAC, FLK2 and 
FAR loci before and after IPTG addition if converging CMG pass each other. 
k, To determine whether most forks stalled at the array and not elsewhere in 
the plasmid, we performed a time course in which p[lacO,.] undergoing 
termination was examined by 2D gel electrophoresis at various time points. 
pl/acO,.] was pre-bound to LacR and replicated in Xenopus egg extract 
containing [a-°*P]dATP. Termination was induced by IPTG addition and 
samples were withdrawn at different times. Radiolabelled replication 
intermediates were cleaved with XmnI (as in Extended Data Fig. 1a) and 
separated according to size and shape on 2D gels”. A parallel reaction was 
performed in which samples were analysed by ChIP, which was one of the 
repeats analysed in b-e. In the presence of LacR, a subset of double-Y molecules 
accumulated (blue arrowhead), demonstrating that 83% of replication 
intermediates (signal in dashed blue circle) contained two forks converged at a 
specific locus. After IPTG addition, linear molecules rapidly accumulated 
(orange arrowhead) as dissolution occurred. Importantly, the vast majority of 
signal was present in the discrete double-Y and linear species (blue and orange 
arrowheads), demonstrating that the relatively high ChIP signal observed at 
FAR in panels f-i was derived from forks present at the JacO,. array and not 
elsewhere. 
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Extended Data Figure 8 | Supplementary termination data for p[empty] 
experiments. a, Cartoon depicting the XmnI and AlwNI sites on p[empty], 
which are used for the dissolution and ligation assays, respectively, and the 
FLK2 locus, which is used for ChIP. b, Plasmid DNA without a lacO array 
(p[empty]) was replicated and at different times chromatin was subjected to 
MCM7 and CDC45 ChIP. Per cent recovery of FLK2 was quantified and used to 
measure dissociation of MCM7 and CDC45 (see Methods). Dissolution and 
ligation were also quantified in parallel. Mean = s.d. is plotted (n = 3). The 
MCM7 and CDC45 dissociation data are obtained from the vehicle controls in 
Fig. 5b, c, while the dissolution and ligation data are obtained from the vehicle 
controls in Fig. 5d, e. c, To seek independent evidence for the conclusions 

of the ChIP data presented in Fig. 5b, c, we used a plasmid pull-down 
procedure. p[empty] was replicated in egg extracts treated with vehicle or Ub- 
VS. At the indicated times, chromatin-associated proteins were captured on 
LacR-coated beads (which binds DNA independently of lacO sites) and 
analysed by western blotting for CDC45, MCM7 and PCNA. CDC45 and 
MCM7 dissociated from chromatin by 8 min in the vehicle control, but 
persisted following Ub-VS treatment. d, To test whether the MCM7 
modifications detected in panel c represented ubiquitylation, extracts were 
incubated with His.-ubiquitin in the absence of cyclin A, and in the absence or 
presence of plasmid DNA. After 15 min, Hisg-tagged proteins were captured 
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by nickel resin pull down and blotted for MCM7. DNA replication greatly 
increased the levels of ubiquitylated MCM7, with the exception of a single 
species that was ubiquitylated independently of DNA replication (*). These 
data show that MCM7 is ubiquitylated during plasmid replication in egg 
extracts, as observed in yeast and during replication of sperm chromatin after 
nuclear assembly in egg extracts”*”’. e, In parallel to the plasmid pull downs 
performed in c, DNA samples were withdrawn for dissolution, ligation and 
decatenation assays, none of which was perturbed by Ub-VS treatment. These 
data support our conclusion, based on ChIP experiments (Fig. 5), that 
defective CMG unloading does not affect dissolution, ligation, or decatenation. 
f, Decatenation was measured in the same reactions used to measure 
dissolution and ligation (Fig. 5d, e), mean + s.d. is plotted (n = 3). g-i, Given 
the experimental variability at the 4 min time point in Fig. 5d-f, the primary 
data and quantification for dissolution (g), ligation (h) and decatenation (i) for 
one of the three experiments summarized in Fig. 5d-f is presented. This 
reveals that Ub-VS does not inhibit dissolution, ligation, or decatenation at the 
4 min time point. The same conclusion applies to two additional repetitions 
of this experiment (data not shown). j, The primary ChIP data used to 
measure dissociation of MCM7 and CDC45 in Fig. 5b, c is shown. Recovery 
of FLK2 was measured. Mean + s.d. is plotted (n = 3). 
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Extended Data Figure 9 | Cyclin A treatment synchronizes DNA replication 
in Xenopus egg extracts. a, b, To synchronize DNA replication in Xenopus 
egg extracts, we treated extracts with cyclin A, which probably accelerates 
replication initiation®. Plasmid DNA was incubated in High Speed 
Supernatant for 20 min, then either buffer or cyclin A was added for a further 
20 min. NucleoPlasmic extract was added to initiate DNA replication, along 
with [o-?P]dATP to label replication intermediates. Replication products were 
separated on a native agarose gel, detected by autoradiography (a), and 
quantified (b). In the presence of vehicle, replication was not complete by 

9.5 min, but in the presence of cyclin A, replication was almost complete by 
4.5 min (b). Thus, cyclin A treatment approximately doubles the speed of DNA 
replication in Xenopus egg extracts. c-f, To test whether cyclin A affects the 
ability of LacR to inhibit termination, we monitored dissolution of plasmids 
containing a 12 or 16 LacR array in the presence and absence of cyclin A. 
p[lacO,,], p[/acO;¢], and the parental control plasmid p[empty] were incubated 
with LacR, and then treated with buffer or cyclin A before replication was 
initiated with NPE in the presence of [«-**P]dATP. Samples were withdrawn 
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when dissolution of p[empty] plateaued (9.5 min in the presence of buffer, 
4.5 min in the presence of cyclin A). Given that cyclin A treatment 
approximately doubles the speed of replication (see b), samples were 
withdrawn from these reactions twice as frequently as the buffer-treated 
samples. To measure dissolution, radiolabelled termination intermediates were 
cut with XmnI to monitor the conversion of double-Y molecules to linear 
molecules. Cut molecules were separated on a native agarose gel and detected 
by autoradiography (c, e). By the time the first sample was withdrawn, 
dissolution of p[empty] was essentially complete, in the absence (9.5 min, d) or 
presence (4.5 min, f) of cyclin A. Importantly, dissolution of p[/acO,2] and 
p[/acO6] was prevented in the absence (9.5 min, d) or presence (4.5 min, f) of 
cyclin A. Moreover, dissolution occurred approximately twice as fast in the 
presence of cyclin A (note the similarity between d and f even though samples 
are withdrawn twice as frequently in f) consistent with replication being 
approximately twice as fast in the presence of cyclin A. Therefore, cyclin A does 
not affect the ability of a LacR array to block replication forks. 
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Extended Data Table 1 | Tables of plasmids and oligonucleotides used 


A. 


Plasmid 


pJD82 
pJD85 
pJD88 
pJD92 
pJD100 
pJD104 
pJD105 
pJD139 
pJD145 
pJD150 
pJD152 
pJD156 
pQNT 


B. 
Oligo 


JDO38 


JDO39 


JDO42 
JDO43 
JDO94 
JDO95 
JDO100 
JDO101 


JDO107 
JDO109 
JDO110 
JDO111 
FLK2_F 
FLK2_R 
LAC_F 
LAC_R 
FAR_F 
FAR_R 
QNT_F 


QNT_R 


Insert 

Sacl-BsrGI-(lacO)x4-BsiWI-Kpnl 
Sacl-BsrGI-(lacO)x8-BsiWI-Kpnl 
Sacl-BsrGI-(lacO)x16-BsiWI-Kpnl 
Sacl-BsrGI-(lacO)x32-BsiWI-Kpnl 
Sacl-BsrGI-(lacO)x48-BsiWI-Kpnl 
Sacl-BsrGI-(lacO)x12-BsiWI-Kpnl 
Sacl-Nb.BsmI-BsiWI-Nt.BbvCl-Kpnl 
Sacl-Nb.BsmlI-BsiWI-Nt.BbvCl-Kpnl 
Sacl-Nb.BsmI-BsiWI-Nt.BbvCl-Kpnl 
Sacl-Nb.Bsml-(lacO)x12-BsiWI-Nt.BbvCl-Kpnl 
Sacl-Nb.Bsml-(lacO)x16-BsiWI-Nt.BbvCl-Kpnl 
Sacl-Nb.Bsml-(lacO)x32-BsiWI-Nt.BbvCl-Kpnl 


Sequence 


5 '-GTACATCAATTGTGAGCGGATAACAATTGTTA 
GGGAGGAATTGTGAGCGGATAACAATTTGGAGTTG 
ATAATTGTGAGCGGATAACAATTGGCTTCAACGTA 
ATTGTGAGCGGATAACAATTTCC-3' 


5 '-GTACGGAAATTGTTATCCGCTCACAATTACGT 
TGAAGCCAATTGTTATCCGCTCACAATTATCAACT 
CCAAATTGTTATCCGCTCACAATTCCTCCCTAACA 
ATTGTTATCCGCTCACAATTGAT-3' 


5 '-CTGTACAGCATTCCCATGGCGTACGTTCTAGA 
CCTCAGCTATGGTACC-3' 


5 '-AGCGGTACCATAGCTGAGGTCTAGAACGTACG 
CCATGGGAATGCTGTACAGAGCT-3' 


5 '-TAAGGGATTTTGCCGATTTCGGCCTATGCTCT 
TCGCAGTGTGGTTAAAAAATGAGC-3' 


5 '-GCTCATTTTTTAACCACACTGCGAAGAGCATA 
GGCCGAAATCGGCAAAATCCCTTA-3' 


5 '-TGAGCGTCGATTCATTGCTTTGTGATGCTCGT 
CAGGGGG-3' 


5 '-CCCCCTGACGAGCATCACAAAGCAATGAATCG 
ACGCTCA-3' 


5 '-CAGTGTGGTTAAAAAATGAGCTG-3' 
5'-CATTGCTTTGTGATGCTCGT-3' 

5 '-TGGTTAAAAAATGAGCTGATTTAACA-3' 
5 '-TGAGGTCTAGAACGTACGGAAA-3' 
5 '-TCTTCGCTATTACGCCAGCT-3' 

5 '-TTACAACGTCGTGACTGGGA-3' 

5 '-AGCGGATAACAATTGTTAGGGA-3' 
5 '-CTCACAATTACGTTGAAGCCAA-3' 
5 '-ATTGCTACAGGCATCGTGGT-3' 

5 '-GGGATCATGTAACTCGCCTTGA-3' 
5 '-TACAAATGTACGGCCAGCAA-3' 


5 '-GAGTATGAGGGAAGCGGTGA-3' 


Construction 


Replacement of the sequence between Sacl and KpnI of pBluescript II KS- 
JDO38/39 annealed and cloned into pJD82 that had been cut with BsrGl 
BsrGI/BsiWI fragment from pJD85 cloned into pJD85 that had been cut with BsrGI 
BsrGI/BsiWI fragment from pJD88 cloned into pJD88 that had been cut with BsrGI 
BsrGI/BsiWI fragment from pJD88 cloned into pJD92 that had been cut with BsrGI 
JDO38/39 annealed and cloned into pJD85 that had been cut with BsrGI 
Replacement of the sequence between Sacl and KpnI of pBluescript II KS- with JDO42/43 
Quickchange mutagenesis of pJD105 using JDO94/95 

Quickchange mutagenesis of pJD139 using JDO100/10 

BsrGI/BsiWI fragment from pJD104 cloned into pJD145 that had been cut with BsiWI 
BsrGI/BsiWI fragment from pJD88 cloned into pJD145 that had been cut with BsiWI 
BsrGIl/BsiWI fragment from pJD92 cloned into pJD145 that had been cut with BsiWI 
pCDFDuet-1 containing a Hincll site (pQuant from 4") 


Description 


Can be annealed to JDO39 to generate dsDNA containing 4x lacO sites with ends that are 
compatible with BsiWI and BsrGl. 


Can be annealed to JDO38 to generate dsDNA containing 4x lacO sites with ends that are 
compatible with BsiWI and BsrGl. 


Can be annealed to JDO43 to generate dsDNA containing sites for BsrGI-Nb.BsmI-Ncol- 
BsiWI-Xbal-Nb.BbvCl with ends that are compatible with Sacl and KpnI. 


Can be annealed to JDO42 to generate dsDNA containing sites for BsrGI-Nb.BsmI-Ncol- 
BsiWI-Xbal-Nb.BbvCl with ends that are compatible with Sacl and KpnI. 


Used with JDO95 to introduce Nt.BspQI and Nb.Bisl sites upstream of Nb.Bsml in 
pJD105-derived plasmids by Quickchange mutagenesis. 


Used with JDO94 to introduce Nt.BspQI and Nb.Bisl sites upstream of Nb.Bsml in 
pJD105-derived plasmids by Quickchange mutagenesis. 


Used with JDO101 to introduce Nb.BsrDI site downstream of BbvCl in pJD105-derived 
plasmids by Quickchange mutagenesis. 


Used with JDO100 to introduce Nb.BsrDI site downstream of BbvCl in pJD105-derived 
plasmids by Quickchange mutagenesis. 


Sequencing primer for mapping leading strands released by Nt.BspQI digestion 

Sequencing primer for mapping leading strands released by Nb.BsrDI digestion 

Sequencing primer for mapping lagging strands released Nb.Btsl digestion. 

Sequencing primer for mapping leading strands released by Nb.BbvCl digestion. 

Used with FLK2_R to amplify the region 82-173 bases upstream of the lacO array in pJD152 
Used with FLK2_F to amplify the region 82-173 bases upstream of the lacO array in pJD152 
Used with LAC_R to amplify four sites within the lacO array in pJD152 


Used with LAC_F to amplify four sites within the lacO array in pJD152 


Used with FAR_R to amplify the region 1286-1375 bases upstream of the lacO array in pJD152 
Used with FAR_F to amplify the region 1286-1375 bases upstream of the lacO array in pJD152 


Used with QNT_R to amplify pQNT 
Used with QNT_F to amplify pQNT 
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Relativistic boost as the cause of periodicity in a 
massive black-hole binary candidate 


Daniel J. D’Orazio', Zoltan Haiman! & David Schiminovich! 


Because most large galaxies contain a central black hole, and gal- 
axies often merge’, black-hole binaries are expected to be common 
in galactic nuclei’. Although they cannot be imaged, periodicities 
in the light curves of quasars have been interpreted as evidence for 
binaries**, most recently in PG 1302-102, which has a short rest- 
frame optical period of four years (ref. 6). If the orbital period of 
the black-hole binary matches this value, then for the range of 
estimated black-hole masses, the components would be separated 
by 0.007-0.017 parsecs, implying relativistic orbital speeds. There 
has been much debate over whether black-hole orbits could be 
smaller than one parsec (ref. 7). Here we report that the amplitude 
and the sinusoid-like shape of the variability of the light curve of 
PG 1302-102 can be fitted by relativistic Doppler boosting of emis- 
sion from a compact, steadily accreting, unequal-mass binary. We 
predict that brightness variations in the ultraviolet light curve 
track those in the optical, but with a two to three times larger 
amplitude. This prediction is relatively insensitive to the details 
of the emission process, and is consistent with archival ultraviolet 
data. Follow-up ultraviolet and optical observations in the next few 
years can further test this prediction and confirm the existence of a 
binary black hole in the relativistic regime. 

Assuming PG 1302-102 is a binary, it is natural to attribute its 
optical emission to gas that is bound to each black hole, forming 
circumprimary and circumsecondary accretion flows. Such flows, 
which form ‘minidisks’, are generically found in high-resolution 
two- and three-dimensional hydrodynamic simulations that include 
the black holes in their simulated domain*'*. Assuming a circular 
orbit, the velocity of the lower-mass secondary black hole is 


an \(GM\3 1.5 M Ver p VI 
v= =8,500 km s7! 
1+q/\4n’P 1+q/\108°Mo 4.04 yr 


or approximately 0.03c for the fiducial parameters chosen in the par- 
entheses on the right (q=0.5, M= 10°°Mo, P= 4.04yr), where 
M = M,+ Mz is the total binary mass, M, 2 are the individual masses, 
q = M2/M, = 1 is the mass ratio, Me is the mass of the Sun, P is the 
orbital period, Gis the gravitational constant, and cis the speed of light. 
The orbital velocity of the higher-mass primary black hole is v) = qvo. 
Even if a minidisk has a steady intrinsic rest-frame luminosity, its 
apparent flux on Earth is modulated by relativistic Doppler beaming. 
The photon frequencies suffer relativistic Doppler shift by the factor 
D= [r(1—B))] a where [= (1 — pon? is the Lorentz factor, 
Bb = v/cis the three-dimensional velocity v in units of the speed of light 
c, and f) = f cos(g)sin(i) is the component of the velocity along the 
line of sight, with i and @ the orbital inclination and phase, respectively. 
Because the photon phase-space density, which is proportional to 
F,/v", is invariant in special relativity, the apparent flux F, at a fixed 
observed frequency v is modified from the flux of a stationary source 
F) to Fy =D°F),,,=D°~*F). The last step assumes an intrinsic 
power-law spectrum F?ocv*. To first order in v/c, this assumption 
causes a sinusoidal modulation of the apparent flux along the orbit, 
by a fractional amplitude AF,/F, = +(3 — «)vcos(g/c)sin(i). Although 


light-travel time modulations appear at the same order, they are sub- 
dominant to the Doppler modulation. This modulation is analogous to 
periodic modulations from relativistic Doppler boost predicted’* and 
observed for extrasolar planets’”"* and for a double white-dwarf bin- 
ary’, but here it has a much higher amplitude. 

The light curve of PG 1302-102 is well measured over approximately 
two periods (approximately 10 yr). The amplitude of the variability is 
+0.14 mag (measured in the optical V band’°), which corresponds to 
AF,/F, = £0.14. The spectrum of PG 1302-102 in and around the 
V band is well approximated by a double power-law, with « ~ 0.7 
(between 0.50 um and 0.55 im) and « ~ 1.4 (between 0.55 um and 
0.6 um), except for small deviations caused by broad lines. We obtain 
an effective single slope a, = 1.1 over the entire V band. We conclude 
that the 14% variability can be attributed to relativistic beaming for a 
line-of-sight velocity amplitude of vsin(i) = 0.074c = 22,000kms''. 

Although large, this velocity can be realized for a massive (high-M) 
but unequal-mass (low-q) binary, whose orbit is viewed not too far 
from edge-on (high sin(i)). In Fig. 1, we show the required combina- 
tion of these three parameters that would produce a 0.14-mag vari- 
ability in the sum-total of Doppler-shifted emission from the primary 
and the secondary black hole. As this figure shows, the required mass is 
2 10°'Mo, consistent with the high end of the range that has been 
inferred for PG1302-102. The orbital inclination can be in the 
range i = 60°-90°. The mass ratio q has to be low, q < 0.3, which is 
consistent with expectations based on cosmological galaxy merger 
models”, and also with the identification of the optical and binary 
periods (for q 20.3, hydrodynamic simulations predict that the 
mass-accretion rates fluctuate with a period several times longer than 
the orbital period”’). 

As Fig. 1 shows, fully accounting for the observed optical variability 
also requires that the bulk of the optical emission arises from gas 
bound to the faster-moving secondary black hole (f; = 80%). We find 
that this condition is naturally satisfied for unequal-mass black holes. 
Hydrodynamic simulations have shown that for 0.03 < q < 0.1, the 
accretion rate onto the secondary black hole is a factor of 10-20 higher 
than that onto the primary’. Because the secondary captures most of 
the accreting gas from the circumbinary disk, the primary is ‘starved’, 
and radiates with a much lower efficiency. In the (M,q) ranges 
favoured by the beaming scenario, we find that the primary contributes 
less than 1% to the total luminosity, and the circumbinary disk con- 
tributes less than 20%, leaving the secondary as the dominant source of 
emission in the three-component system (see Methods). 

The optical light curve of PG 1302-102 appears remarkably sinus- 
oidal compared to that of the best-studied previous quasi-periodic 
quasar binary black-hole candidate, which shows periodic bursts*. 
Nevertheless, the light-curve shape deviates from a pure sinusoid. 
To see if such deviations naturally arise within our model, we max- 
imized the Bayesian likelihood over five parameters (period P, velocity 
amplitude K, eccentricity e, argument of pericentre «, and an arbitrary 
reference time fo) of a Kepler orbit” and fitted the observed optical 
light curve. In this procedure, we accounted for additional stochas- 
tic physical variability with a broken-power-law power spectrum 
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Figure 1 | Binary parameters producing the optical flux variations of 

PG 1302-102 by relativistic boost. Combinations of total binary mass M, 
mass ratio q = M,/M,, and inclination i that cause >13.5% flux variability (or 
line-of-sight velocity amplitude (v/c)sin(i) = 0.07) in the emission from the 
primary and secondary black holes, computed from the Doppler factor D?~ % 
with the effective spectral slope of %,¢ = 1.1 in the V band. The solid lines 


(a ‘damped random walk™*) described by two additional parameters. 
This analysis returns a best-fit with a non-zero eccentricity of 
e=0.09*9;52, although a Bayesian criterion does not favour this 
model over a pure sinusoid with fewer parameters (see Methods). 
We considered an alternative model to explain optical variability of 
PG 1302-102, in which the luminosity variations track the fluctuations 
in the mass-accretion rate that is predicted in hydrodynamic simula- 
tions”"''!°?°, However, the amplitude of these hydrodynamic fluctua- 
tions are large (order one), and their shape is ‘bursty’ rather than 
sinusoid-like'’’*; as a result, we find that they provide a poorer fit 
to the observations (see Fig. 2 and Methods). Furthermore, for mass 
ratios q 20.05, hydrodynamic simulations predict a characteristic 
pattern of periodicities at multiple frequencies, but an analysis of 
the periodogram of PG 1302-102 has not uncovered evidence for mul- 
tiple peaks”®. 
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Figure 2 | The optical and ultraviolet light curves of PG 1302-102. The grey 
filled circles with 1 errors are the optical data®, superimposed with a best-fit 
sinusoid (red dashed curve). The solid black curve is the best-fit relativistic 
light-curve. The blue dashed curve is the best-fit model that was obtained by 
scaling the mass-accretion rate determined from a hydrodynamic simulation of 
an unequal-mass (q = 0.1) binary'!. The red and blue filled circles with 1o 
errors correspond to archival NUV (red) and FUV (blue) spectral observations; 
the red filled triangles (with 1o errors) represent archival photometric NUV 
data (see Fig. 3). The UV data include an arbitrary overall normalization to 
match the mean optical brightness. The red and blue dotted curves are the best- 
fit relativistic optical light curves with amplitudes scaled up by factors of 2.17 
and 2.57, which best match the NUV and FUV data, respectively. MJD, 
modified Julian day. 
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correspond to different values of q as labelled; the shaded regions correspond to 
intermediate values. We assume that a fraction f, = 1.0, 0.95, or 0.8 of the total 
luminosity arises from the secondary black hole; these values are consistent 
with fractions found in hydrodynamic simulations'’ (see Methods). The 
inclination angle is defined such that i = 0° corresponds to a face-on view of 
PG 1302-102, and i = 90° corresponds to an edge-on view. 


A simple observational test of relativistic beaming is possible, owing to 
the strong frequency dependence of the spectral slope of PG 1302-102: 
a = din(F,)/din(v). The continuum spectrum of PG 1302-102 is nearly 
flat with a slope Bruy = din(F,)/din(A) = 0 in the far-ultraviolet (FUV; 
0.145-0.1525 um) band, where F, is the apparent flux at an observed 
wavelength /, and shows a tilt with yyy = —0.95 in the near-ultraviolet 
(NUV; 0.20-0.26 um) range; see Fig. 3 and Methods. These slopes trans- 
late to &pyy = —2 and oyuv = — 1.05 in the respective bands, compared 
tO Opt = 1.1 in the optical. The UV emission can be attributed to the 
same minidisks that are responsible for the optical light, and would 
therefore share the same Doppler shifts in frequency. These Doppler 
shifts would translate into UV variability that is larger by a factor of 
(3 _ aruy)/(3 = Oopt) = 5/1.9 = 2.63 and (3 = &nuv)/(3 = opt) = 4,05/ 
1.9 = 2.13 compared to the optical, and reaches maximum amplitudes of 
+37% (FUV) and +30% (NUV). 
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Figure 3 | Archival UV spectra of PG 1302-102 from 1992-2011. FUV and 
NUV spectra obtained by instruments on the HST and by GALEX, as labelled. 
COS, cosmic origins spectrograph; FOS, faint object spectrograph; STIS, 
space telescope imaging spectrograph. Numbers in brackets are the dates (in 
MJD — 49,100) the data were collected. Vertical yellow bands mark regions 
outside the spectroscopic range of both GALEX and the HST and contain no 
useful spectral data. Assignments of the main peaks are given. Lya, Lyman a. 
From each spectrum, average flux measurements (shown in Fig. 2) were 
computed in one or both of the UV bands over the frequency range indicated by 
the horizontal bars. The full GALEX photometric band shapes for FUV and 
NUV photometry are shown for reference as shaded blue and red curves, 
respectively. Additional GALEX NUV photometric data were also used 

in Fig. 2. The UV spectra show an offset by as much as +30% relative to one 
another, close to the value expected from relativistic boost (see Methods). 
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Five separate UV spectra of PG1302-102 have been collected 
between 1992 and 2011, by instruments on the Hubble Space 
Telescope (HST) and on the Galaxy Evolution Explorer (GALEX) 
satellite (see Fig. 3); additional photometric observations were taken 
with GALEX at four different times between 2006 and 2009 (shown in 
Fig. 2). The brightness variations in both the FUV and NUV bands 
show variability that resembles the optical variability, but with a larger 
amplitude. Adopting the parameters of our best-fit sinusoid model, 
and allowing only the amplitude to vary, we find that the UV data 
yields best-fit variability amplitudes of AF,/F,|guy = £(35.0 + 3.9)% 
and AF,/F,|juv = £(29.5 + 2.4)% (shown in Fig. 2). These ampli- 
tudes are factors of 2.57 + 0.28 and 2.17 + 0.17 higher than in the 
optical, in excellent agreement with the values 2.63 and 2.13 that are 
expected from the corresponding spectral slopes. 

Relativistic beaming provides a simple and robust explanation of the 
optical periodicity of PG 1302-102. The prediction that the larger UV 
variations should track the optical light curve can be tested rigorously 
in the future with measurements of the optical and UV brightness that 
are collected at or near the same time, are repeated two or more times, 
are separated by a few months to about 2 yr, and cover up to half of the 
optical period. A positive result will constitute the first detection of 
relativistic massive black-hole binary motion; it will also serve as a 
confirmation of the binary nature of PG 1302-102, remove the ambi- 
guity in the orbital period, and tightly constrain the binary parameters 
to be close to those shown in Fig. 1. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

V-band emission from a three-component system in PG 1302-102. Here we 
assume that the PG 1302-102 supermassive black-hole (SMBH) binary system 
includes three distinct luminous components: a circumbinary disk (CBD), as well 
as actively accreting primary and secondary SMBHs. The optical brightness of each 
of the three components can be estimated once their accretion rates and the black- 
hole masses M, and Mb are specified. Using the absolute V-band magnitude of 
PG 1302-102, My = —25.81, and applying a bolometric correction BC ~ 10 (ref. 
27), we infer a total bolometric luminosity of Ly.) = 6.5(BC/10) x 10*° erg s). 
Bright quasars with the most massive SMBHs (M = 10°Mo), have a typical radi- 
ative efficiency of ¢ = 0.3 (ref. 28). Adopting this value, the implied accretion rate is 
M=Lho /(€c?)=3.7Meyr—! (where the overdot indicates differentiation with 
respect to time). 

We identify this as the total accretion rate through the CBD, and require that at 
small radii, the rate is split between the two black holes M=M, +M)j with the 
ratio y= M,/M,. Hydrodynamic simulations" have found that the secondary 
black hole captures the large majority of the gas, with 10<7<20 for 
0.03 < q $0.1 (where q = M)/M,). Defining the Eddington ratio of the ith disk 
as its accretion rate scaled by its Eddington-limited rate fi; gaa =M,;+M ida With 
Mead = Lega /(0.1c) (here Lega is the Eddington luminosity for the ith black hole, 
and we have adopted the fiducial radiative efficiency of € = 0.1 to be consistent 
with the standard definition in the literature), we have 


M -1 
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where the subscripts 1 and 2 refer to the primary and the secondary black holes, 
respectively. We adopt a standard, radiatively efficient, geometrically thin, optic- 
ally thick Shakura-Sunyaev disk model” to compute the luminosities produced 
in the CBD and the circumsecondary disk (CSD). Although the secondary 
black hole is accreting at a super-Eddington rate, recent three-dimensional radi- 
ation magnetohydrodynamic simulations of super-Eddington accretion find 
radiative efficiencies comparable to the values in standard thin-disk models*”. 
On the other hand, the primary black hole is accreting below the critical rate 
My S Mapag © 0.027( i 0.3)’Mpga (ADAE, advection-dominated accretion flow; 
a is the viscosity parameter) at which advection dominates the energy balance*’. 
We therefore estimate the luminosity of the primary black hole from a radiatively 
inefficient ADAF*”, rather than from a Shakure-Sunyaev disk. This interpreta- 
tion is supported by the fact that PG 1302-102 is known to be an extended radio 
source, with evidence for a jet and bends in the extended radio structure™, features 
that are commonly associated with sub-Eddington sources”. 

For the radiatively efficient CBD and CSD, the frequency-dependent luminosity 
is determined by integrating the local, modified black-body flux over the area of 
the disk: 


an 


‘Rout 
Ly=2n | F, [Tp(r)] rdr 
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B, is the Planck function, «°° is the frequency-dependent absorption opacity, «* is 


the electron scattering opacity, and r € (Rin, Rout) is the radial coordinate, with 
Rin out the inner and outer radii of the appropriate disk. We compute the radial 
disk-photosphere temperature profile T,, by equating the viscous heating rate with 
the modified black-body flux: 
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o is the Stefan—Boltzmann constant, kg is the Boltzmann constant, h is the 
Planck constant, and rjsco is the radius of the innermost stable circular orbit 


(ISCO). When solving for the photosphere temperature, we work in the limit that 
Keabs <—KES following appendix A of ref. 36, and we adopt r1sco =6GM;/c’, and 


C(v,Tp) 


(Rins Rout) = (2a, 200a) and (Rin, Rout) = ("p,1sco> a(q/3)'/?) for the CBD and CSD, 
respectively. Here the superscript i refers to the ith disk, a is the binary separation, 
6GM is the location of the ISCO for a Schwarzschild black hole (our results are 
insensitive to this choice) and a(q/ 3)" is the Hill radius of the secondary black 
hole (which provides an upper limit on the size of the CSD”). 

The optical luminosity of an ADAF is sensitive to the assumed microphysical 
parameters and its computation is more complicated than that for a thin disk. Here 
we first compute a reference thin-disk luminosity Lss (SS, Shakure-Sunyaev) for 
the primary black hole, and multiply it by the ratio of the bolometric luminosity of 
an ADAF to an equivalent thin-disk luminosity from ref. 32: 


L M/M, -2 
ADAF = 0.008 /. Edd (5) 
Iss 0.0034 0.3 


For calculating the reference Lgs, we adopted parameters that are consistent with 
ref. 32, in particular, ¢ = 0.1. Although the above ratio is for bolometric luminos- 
ities, we find that it agrees well with the factor-of-100 difference in the V band 
shown in figure 6 of ref. 33, between ADAF and thin-disk spectra, with parameters 
similar to PG 1302-102 (10°Mo, M=Mapar=10~!>Mgga, & ~ 0.3). 

Extended Data Figure 1 shows the thin-disk CBD and CSD spectra for a total 
Eddington ratio of fcgp,.zaa = 0.07, consistent with the high-mass estimates for 
PG 1302-102 that are needed for the beaming scenario (M=10°4Mo and 
q = 0.05). The red dot shows the reduced V-band luminosity of an ADAF onto 
the primary. The secondary clearly dominates the total V-band luminosity, with 
the primary contributing less than 1%, and the CBD contributing approxi- 
mately14%. In practice, the contribution from the CBD becomes non-negligible 
only for the smallest binary masses and lowest mass ratios (reaching 20% for 
M<10°Mo and q< 0.025). 

We compute the contributions of each of the three components to the total 

luminosity, LY,=LY+LY+L¢gp, and the corresponding total fractional- 
modulation amplitude ALY,/L,,,=(AL] +AL})/LY,, for each value of the 
total mass M and mass ratio q. The primary is assumed to be Doppler modulated 
with a line-of-sight velocity v, = —qv2, whereas the emission from the CBD is 
assumed to be constant over time (ALY, =0). Extended Data Figure 2 shows 
regions in (M, q,i) parameter space where the total luminosity variation due to 
relativistic beaming exceeds 14%. This figure recreates Fig. 1 of the main text, but 
using the luminosity contributions computed self-consistently in the above model, 
rather than assuming a constant value of f,. Because the secondary is found to be 
dominant, the relativistic-beaming scenario is consistent with a wide range of 
binary parameters. 
Model fitting to the PG 1302-102 optical light curve. We fitted models to 
the observed light curve of PG 1302-102 by maximizing the Bayesian likeli- 
hood Loc det |Cov?Cov?h|~!/? exp(—2/2), where =Y1(Cov)"'Y and 
Y= O- Mis the difference vector between the mean flux predicted in a model 
and the observed flux at each observation time t;. Here Cov is the covariance 
matrix of flux uncertainties, allowing for correlations between fluxes measured 
at different t;. We include two types of uncertainties: (1) random (uncorrelated) 
measurement errors 


O° sed 
a 0 ij 
where o7 is the variance in the photometric measurement for the ith data point (as 
i P. P 


reported in ref. 6); and (2) correlated noise due to intrinsic quasar variability, with 
covariance between the ith and jth data points 


—|ti-t| 
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The parameters op and Tp determine the amplitude and rest-frame coherence 
time, respectively, of correlated noise described by the damped random walk 
model, and the factor of (1 + z) converts tp to the observer’s frame where the 
t; are measured. The normalization of the Bayesian likelihood depends on these 
parameters, and therefore the normalization must be included when maximizing 
the likelihood over these parameters**. The covariance matrix for the total noise is 
given by Cov = Cov? + Cov’, We assume both types of noise are Gaussian, 
which provides a good description of observed quasar variability”. 

We then fit the following four different types of models to the data. 
(1) A relativistic beaming model with 5 + 2 = 7 model parameters: eccentricity e, 
argument of pericentre w, amplitude K, phase fg, and orbital period P, as well as the 
two noise parameters op and Tp. 
(2) An accretion rate model with 3 + 2 = 5 model parameters: amplitude K, phase 
to, and period P, as well as the two noise parameters dp and Tp. This model 
assumes that the light curve of PG 1302-102 tracks the mass-accretion rates that 
are predicted in hydrodynamic simulations. For near-equal-mass binaries, several 


©2015 Macmillan Publishers Limited. All rights reserved 


studies have found that the mass-accretion rates fluctuate periodically, but 
resemble a series of sharp bursts, unlike the smoother, sinusoid-like shape of 
the light curve of PG 1302-102. To our knowledge, only three studies so far have 
simulated unequal-mass (q = 0.1) SMBH binaries'’*>. The accretion rates for 
these binaries are less ‘bursty’; among all of the cases in these three studies, the 
q = 0.075 and q = 0.1 binaries in ref. 11 resemble the light curve of PG 1302-102 
most closely (see Extended Data Fig. 3). Here we adopt the published accretion 
curve for q = 0.1, and perform a fit to PG 1302-102 by allowing an arbitrary linear 
scaling in time and amplitude, as well as a shift in phase; this gives us the three free 
parameters for this model. (We find that the q = 0.075 case provides a worse fit.) 
(3) A sinusoid model with 3 + 2 = 5 parameters: amplitude K, phase fo, and period 
P, as well as the two noise parameters dp and tp. This model is equivalent, to first 
order in v/c, to the beaming model restricted to a circular binary orbit. 

(4) A constant luminosity model with 2 parameters, the noise parameters op and 
Tp. This model is for reference only, to quantify how poor the fit is with only these 
parameters. 

In each of these models, we fixed the mean flux to correspond to the mean 
magnitude M that is inferred from the optical data; allowing the mean to be an 
additional free parameter did not change our results. The highest maximum like- 
lihood is found for the beaming model, with best-fit values of P= 1,996 +32 days, 
K=0.065 70007, e= 0.094007, cos(w)= —0.65*)2., and ty) =7181 4? days, 
where the reference point fy is measured from MJD — 49,100. Uncertainties are 
computed with the ‘emcee’ code*’, which implements a Markov chain Monte Carlo 
algorithm, and which we use to sample the seven-dimensional posterior probability 
of the model given the data in ref. 6. We use 28 individual chains to sample the 
posterior for 1,024 steps each. Throwing away the first 600 steps (‘burning in’), we 
run for 424 steps and for each parameter we quote best-fit values corresponding to 
the maximum posterior probability, with errors given by the 85th and 15th per- 
centile values (marginalized over the other six parameters). The best-fit noise 
parameters are (op, Tp) =(0.0491)-0\° mag, 37.6 *}% days). The best-fit model 
has a reduced y7/(N — 1 — 7) ~ 2.1, where N = 245 is the number of data points. 

To assess which of the models is favoured by the data, we use the Bayesian 
information criterion (BIC), a standard method for comparing different models 
that penalizes models with a larger number of free parameters*’. Specifically, 
BIC= —21n(£)+kIn(N), where the first term is evaluated using the best-fit 
parameters in each of the models and k is the number of model parameters. We 
find the following differences ABIC between pairs of models: 

BICacc — BICgeam = 4.0 (the beaming model is preferred over the accretion 
model); 

BICacc — BICsin = 14.9 (the sinusoid model is strongly preferred over the accre- 
tion model); 

BICgin — BICgeam = —10.9 (the sinusoid model is strongly preferred over the 
beaming model); 

BlICconst — BICgeam = 11.5 (the beaming model is strongly preferred over pure 
noise); and 

BlCconst — BICsin = 22.4 (the sinusoid model is strongly preferred over pure 
noise). 

We conclude that a sinusoid, or equivalently the beaming model restricted to a 
circular binary, is the preferred model. This model is very strongly favoured over 
the best-fit accretion model (see Extended Data Fig. 3), with ABIC > 14.9. For the 
assumed Gaussian distributions, this corresponds to an approximate likelihood 
ratio of exp(—14.9/2) ~ 5.7 X 10-*. Although our best-fit beaming model has a 
small non-zero eccentricity, the seven-parameter eccentric model is disfavoured 
(by ABIC = 11.5) over the five-parameter circular case. 

We conservatively allowed the amplitude of accretion-rate fluctuations to be a 

free parameter in the accretion models, but we note that the accretion-rate vari- 
ability measured in hydrodynamic simulations exhibits large (order one) devia- 
tions from the mean, even for 0.05 < q < 0.1 binaries'''*"’. In the accretion-rate 
model, an additional physical mechanism needs to be invoked to damp the fluc- 
tuations to the smaller, approximately 14% amplitude seen in PG 1302-102 (such 
as a more substantial contribution from the CBD and/or the primary). 
Disk precession. The lowest BIC model, with a steady accretion rate and a relat- 
ivistic boost from a circular orbit, has a reduced a = 2.1, indicating that the 
relativistic-boost model with intrinsic noise does not fully describe the observed 
light curve. The residuals could be explained by a lower-amplitude periodic modu- 
lation in the mass-accretion rate, which is expected to have a non-sinusoidal shape 
(with sharper peaks and broader troughs, as mentioned above”’). Alternatively, the 
minidisks, which we implicitly assumed to be co-planar with the binary orbit, 
could instead have a substantial tilt’”. 

A circumsecondary minidisk that is tilted with respect to the orbital plane of the 
binary will precess around the binary angular-momentum vector, causing addi- 
tional photometric variations due to the changing projected area of the disk on the 
sky. The precession timescale can be estimated from the total angular momentum 
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of the secondary disk and the torque exerted on it by the primary black hole. The 
ratio of the precession period to the orbital period of the binary is” 


Porec _—_ 8 vit+4 
Porb ¥/3 cos (0) 


where 6 € (—11/2, n/2) is the angle between the angular-momentum vectors of the 
disk and the binary, and we have chosen the outer edge of the minidisk to coincide 
with the Hill sphere of the secondary black hole Ry = (q/3)'%a, for binary semi- 
major axis a. This choice gives the largest secondary disk and the shortest preces- 
sion rates. For small binary mass ratios, consistent with the relativistic beaming 
scenario, the precession can be as short as 4.8P.,5, which causes variations on a 
timescale that spans the current observations of PG 1302-102. The precession 
timescale would be longer (>20Po:p) for a smaller secondary disk that is tidally 
truncated at 0.27q°°a (ref. 43), and with a more inclined (45°) disk. 

Archival UV data. FUV (0.14-0.175 jum) and NUV (0.19-0.27 um) spectra of 
PG 1302-102 were obtained by the HST and the GALEX since 1992. HST FOS 
NUV spectra were obtained on 17 July 1992 (pre-COSTAR)“*. HST STIS FUV 
spectra were obtained on 21 August 2001 (ref. 45). GALEX FUV and NUV spectra 
were obtained on 8 March 2008 and 6 April 2009, and HST COS FUV spectra were 
obtained on 28 January 2011. All data are publicly available through the Mikulski 
Archive for Space Telescopes at http://archive.stsci.edu. All measurements were 
spectrophotometrically calibrated, and binned or smoothed to a resolution of 
1-3 A. The spectra (Fig. 2) have errors per bin that are typically less than 2%; 
published absolute photometric accuracies are better than 5%. 

From each spectrum, average flux measurements (Fig. 2) were obtained in one 
or both of two discrete bands: FUV continuum (0.145-0.1525 jim; a range chosen 
to avoid the Lya line) and NUV continuum (0.20-0.26 jim). For the GALEX NUV 
photometric data (also used in Fig. 2) we adopted a small correction (0.005 mag) 
for the transformation from the GALEX NUV to our NUV continuum band. 
GALEX FUV photometric data were not used because of the substantial contri- 
bution from redshifted Lyo. The broad lines in the UV spectra (in Fig. 3) do not 
show a large Doppler shift (AZ = (v/c)A ~ 140 A). This is unsurprising, because 
the broad line widths (2,500-4,500 kms‘) are much smaller than the inferred 
relativistic line-of-sight velocities, and are expected to be produced by gas at larger 
radii, unrelated to the rapidly orbiting minidisks that produce the featureless 
thermal continuum emission”. 
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Extended Data Figure 1 | Model spectrum of PG 1302-102. Circumbinary approximate flux from an advection-dominated accretion flow (ADAF) is 
(dashed blue) and circumsecondary (solid black) disk spectra for a total binary — shown as a red dot for the V-band contribution of the primary. The spectrum 
mass of 10°“Mo, binary mass ratio of q = 0.05, and ratio of accretion rates for a radiatively efficient, thin disk around the primary is shown by the thin red 
Mp / M, =20. A vertical dashed line marks the centre of the V band and the dashed curve for reference. 
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Extended Data Figure 2 | Parameter combinations for which the combined _ luminosities of each of the three components are computed from a model: the 
V-band luminosity of the three-component system varies by the required luminosity of the primary is assumed to arise from an ADAF, whereas the 


0.14 mag. M is the binary mass, q is the mass ratio, and i is the orbital luminosity of the secondary is generated by a modestly super-Eddington thin 
inclination angle. This figure is analogous to Fig. 1, except instead of adopting disk. Emission from the circumbinary disk is also from a thin disk, and is 
an ad-hoc fractional luminosity contribution f, by the secondary, the negligible except for binaries with the lowest mass ratio q < 0.01 (see text). 
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adopted from hydrodynamic simulations"! (blue dashed curves) for a q = 0.075 
(left) and a q = 0.1 (right) binary. The grey points with 1o error bars are the 
data for PG 1302-102 (ref. 6). 


Extended Data Figure 3 | Model fits to the optical light curve of PG 1302-102. 
Best-fit curves assuming relativistic boost from a circular binary (solid black 
curves), a pure sinusoid (red dotted curves), and accretion rate variability 
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Spawning rings of exceptional points out of 


Dirac cones 


Bo Zhen'*, Chia Wei Hsu'*, Yuichi Igarashi’**, Ling Lu’, Ido Kaminer', Adi Pick'*, Song-Liang Chua’, 


John D. Joannopoulos! & Marin Soljaci¢! 


The Dirac cone underlies many unique electronic properties of 
graphene’ and topological insulators, and its band structure— 
two conical bands touching at a single point—has also been rea- 
lized for photons in waveguide arrays’, atoms in optical lattices’, 
and through accidental degeneracy**. Deformation of the Dirac 
cone often reveals intriguing properties; an example is the 
quantum Hall effect, where a constant magnetic field breaks the 
Dirac cone into isolated Landau levels. A seemingly unrelated phe- 
nomenon is the exceptional point®’, also known as the parity-time 
symmetry breaking point*”™', where two resonances coincide in 
both their positions and widths. Exceptional points lead to 
counter-intuitive phenomena such as loss-induced transparency”, 
unidirectional transmission or reflection’!*™, and lasers with 
reversed pump dependence”’ or single-mode operation’®’’. Dirac 
cones and exceptional points are connected: it was theoretically 
suggested that certain non-Hermitian perturbations can deform 
a Dirac cone and spawn a ring of exceptional points’*”°. Here we 
experimentally demonstrate such an ‘exceptional ring’ in a photo- 
nic crystal slab. Angle-resolved reflection measurements of the 
photonic crystal slab reveal that the peaks of reflectivity follow 
the conical band structure of a Dirac cone resulting from accidental 
degeneracy, whereas the complex eigenvalues of the system are 
deformed into a two-dimensional flat band enclosed by an excep- 
tional ring. This deformation arises from the dissimilar radiation 
rates of dipole and quadrupole resonances, which play a role ana- 
logous to the loss and gain in parity-time symmetric systems. Our 
results indicate that the radiation existing in any open system can 
fundamentally alter its physical properties in ways previously 
expected only in the presence of material loss and gain. 

Closed and lossless physical systems are described by Hermitian 
operators, which guarantee realness of the eigenvalues and a complete 
set of eigenfunctions that are orthogonal to each other. On the other 
hand, systems with open boundaries’ or with material loss and 
gain”'””° are non-Hermitian®, and have non-orthogonal eigenfunc- 
tions with complex eigenvalues where the imaginary part corresponds 
to decay or growth. The most drastic difference between Hermitian 
and non-Hermitian systems is that the latter exhibit exceptional points 
(EPs) where both the real and the imaginary parts of the eigenvalues 
coalesce. At an EP, two (or more) eigenfunctions collapse into one so 
the eigenspace no longer forms a complete basis, and this eigenfunc- 
tion becomes orthogonal to itself under the unconjugated ‘inner prod- 
uct’. To date, most studies of the EP and its intriguing consequences 
concern parity-time symmetric systems that rely on material loss and 
gain”'””’, but EPs are a general property that require only non- 
Hermiticity. Here, we show the existence of EPs in a photonic crystal 
slab with negligible absorption loss and no artificial gain. When a 
Dirac-cone system has dissimilar radiation rates, the band structure 
is altered abruptly to show branching features with a ring of EPs. We 


provide a complete picture of this system, ranging from an analytic 
model and numerical simulations to experimental observations; taken 
together, these results illustrate the role of radiation-induced non- 
Hermiticity that bridges the study of EPs and the study of Dirac cones. 

We start by showing that non-Hermiticity from radiation can 
deform an accidental Dirac point into a ring of EPs. First, consider a 
two-dimensional photonic crystal (Fig. 1a inset), where a square lattice 
(periodicity a) of circular air holes (radius r) is introduced in a dielec- 
tric material. This is a Hermitian system, as there is no material gain or 
loss and no open boundary for radiation. By tuning a system parameter 
(for example, r), one can achieve accidental degeneracy between a 
quadrupole mode and two degenerate dipole modes at the I’ point 
(centre of the Brillouin zone), leading to a linear Dirac dispersion due 
to the anti-crossing between two bands with the same symmetry*”. 
The accidental Dirac dispersion from the effective Hamiltonian model 
(see equation (1) below with yg = 0) is shown as solid lines in Fig. 1a, 
agreeing with numerical simulation results (symbols). In the effective 
Hamiltonian we do not consider the dispersionless third band (grey 
line) owing to symmetry arguments (Supplementary Information sec- 
tion I), although this third band cannot be neglected in certain calcula- 
tions, including the Berry phase and effective medium properties”. 

Next, we consider a similar, but open, system: a photonic crystal slab 
(Fig. 1b inset) with finite thickness h. With the open boundary, modes 
within the radiation continuum become resonances because they radi- 
ate by coupling to extended plane waves in the surrounding medium. 
Non-Hermitian perturbations need to be included in the Hamiltonian 
to account for the radiation loss. To the leading order, radiation of the 
dipole mode can be described by adding an imaginary part — iy to the 
Hamiltonian, while the quadrupole mode does not radiate owing to its 
symmetry mismatch with the plane waves™. Specifically, at the F' point 
the system has C) rotational symmetry (invariant under 180° rotation 
around the z axis), and the quadrupole mode does not couple to the 
radiating plane wave because the former has a field profile E(r) that is 
even under C, rotation, E(r) = Oc, E(r), whereas the latter is odd, 
E(r)= — Oc, E(r). The effective Hamiltonian is 


Has =( Wo V¢|K| ) (1) 


Ve|k| wo —iyg 


with complex eigenvalues 


Oy = 0-17 + vey/ [KP —K (2) 


where (po is the frequency at accidental degeneracy, v, is the group 
velocity of the linear Dirac dispersion in the absence of radiation, || is 
the magnitude of the in-plane wavevector (k,, ky), and k= /2Vg. 
Here, one of the three bands is decoupled from the other two and 
is not included in equation (1) (see Supplementary Information sec- 
tion II). In equation (2), a ring defined by |.k| = k, separates the k space 
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Figure 1 | Accidental degeneracy in Hermitian and non-Hermitian 
photonic crystals. a, Band structure of a two-dimensional photonic crystal 
consisting of a square lattice of circular air holes. Tuning the radius r leads to 
accidental degeneracy between a quadrupole band and two doubly degenerate 
dipole bands, resulting in two bands with linear Dirac dispersion (red and blue) 
and a flat band (grey). b, ¢, The real (b) and imaginary (c) parts of the 
eigenvalues of an open, and therefore non-Hermitian, system: a 
photonic crystal slab with finite thickness, h. By tuning the radius, accidental 
degeneracy in the real part can be achieved, but the Dirac dispersion is 
deformed owing to the non-Hermiticity. The analytic model predicts that the 


into two regions: inside the ring (|k| < k.), Re(w+) are dispersionless 
and degenerate; outside the ring (|k| > k,), Im(w--) are dispersionless 
and degenerate. In the vicinity of k,, Im(@+) and Re(w-+) exhibit 
square-root dispersion (also known as branching behaviour‘) inside 
and outside the ring, respectively. Exactly on the ring (|k| = k,), the 
two eigenvalues « are degenerate in both real and imaginary parts; 
meanwhile, the matrix Her becomes defective with an incomplete 
eigenspace spanned by only one eigenvector (1, —i)" that is orthogonal 
to itself under the unconjugated ‘inner product’, given by a'b for 
vectors aand b. This self-orthogonality is the definition of EPs; hence, 
here we have not just one EP, but a continuous ring of EPs. We call it an 
exceptional ring. 

Figure 1b, c shows the complex eigenvalues of the photonic crystal 
slab structure calculated numerically (symbols), which closely follow 
the analytic model of equation (2) shown as solid lines in the figure. In 
Supplementary Fig. 1, we show that the two eigenvectors indeed 
coalesce into one at the EP, which is impossible in Hermitian systems 
(also see Supplementary Information section III). When the radius r of 
the holes is tuned away from accidental degeneracy, the exceptional 
ring and the associated branching behaviour disappear, as shown in 
Supplementary Fig. 2. Several properties of the photonic crystal slab 
contribute to the existence of this exceptional ring. Owing to peri- 
odicity, one can probe the dispersion from two degrees of freedom, 
k, and k,, in just one structure. The open boundary provides radiation 
loss, and the C; rotational symmetry differentiates the radiation loss of 
the dipole mode and of the quadrupole mode. 

We can rigorously show that the exceptional ring exists in realistic 
photonic crystal slabs, not just in the effective Hamiltonian model. Our 
proof is based on the unique topological property of EPs: when the 
system parameters evolve adiabatically along a loop encircling an EP, 
the two eigenvalues switch their positions when the system returns to 
its initial parameters’*'”», in contrast to the typical case where the two 
eigenvalues return to themselves. Using this property, we numerically 


) a 0) 
0.01 0.01 kal2n 0.01 0.01 a/on 
y: 


0 
kal2n 


real (imaginary) part of the eigenvalue stays as a constant inside (outside) a ring 
in the wavevector space, indicating two flat bands in dispersion, with a ring of 
exceptional points (EPs) where both the real and the imaginary parts are 
degenerate. The orange shaded regions correspond to the inside of the ring. In 
the upper panels of a-c, solid lines are predictions from the analytic model and 
symbols are from numerical simulations: red squares represent the band 
connecting to the quadrupole mode at the centre; blue circles represent the 
band connecting to the dipole mode at the centre; and grey crosses represent the 
third band that is decoupled from the previous two due to symmetry. The three- 
dimensional plots in the lower panels are from simulations. 


show, in Supplementary Fig. 3 and Supplementary Information sec- 
tion IV, that the complex eigenvalues always switch their positions 
along every direction in the k space, and therefore prove the existence 
of this exceptional ring. As opposed to the simplified effective Hamil- 
tonian model, in a real photonic crystal slab, the EP may exist at a 
slightly different magnitude of k and for a slightly different hole radius 
r along different directions in the k space, but this variation is small and 
negligible in practice (Supplementary Information section V). 

To demonstrate the existence of the exceptional ring in such a 
system, we fabricate large-area periodic patterns in a Si;N, slab 
(n = 2.02 in the visible spectrum, thickness 180 nm) on top of 6 tm 
of silica (n = 1.46) using interference photolithography”. Scanning 
electron microscope (SEM) images of the sample are shown in 
Fig. 2a, featuring a square lattice (periodicity a = 336 nm) of cylin- 
drical air holes with radius 109 nm. We immerse the structure into an 
optical liquid with a specified refractive index that can be tuned; acci- 
dental degeneracy in the Hermitian part is achieved when the liquid 
index is selected to be n = 1.48. We perform angle-resolved reflectivity 
measurements (set-up shown in Fig. 2b) between 0° and 2° along the 
T= X direction and the [> M direction, for both s and p polariza- 
tions. Details of the sample fabrication and the experimental setup can 
be found in Supplementary Information section VI. The measured 
reflectivity for the relevant polarization is plotted in the upper panel 
of Fig. 2c, showing good agreement with numerical simulation results 
(lower panel), with differences coming from scattering due to surface 
roughness, inhomogeneous broadening, and the uncertainty in the 
measurements of system parameters. The complete experimental 
result for both polarizations is shown in Supplementary Fig. 4; the 
third and dispersionless band shows up in the other polarization, 
decoupled from the two bands of interest. 

The peaks of reflectivity (dark red colour in Fig. 2c) follow the linear 
Dirac dispersion; this feature disappears for structures with different 
radii that do not reach accidental degeneracy (experimental results in 


17 SEPTEMBER 2015 | VOL 525 | NATURE | 355 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


p-polarized 


s-polarized d 
M <—— Tr —~> xX 


560 
a 
= 562 m 
iS $ ao 
& c i 
s $s sn 
2 564 2 3x | ° 
= . 
zs s =§ 2 © (one/a) 
> o e 
& S Q 
= 566 a 
568 
0=0.1 0.31 0.8 = 
560 a 
Sx 
Ew 
= 562 xz 8 
= 2 x 
5 : 
© 564 < 
oO 
@ 5 
> 
w iS) 
a 
= 566 5 « 
e 
=] 
=n 
568 Sx 
1 05 Eg 
Angle (degrees) & 


0 


Figure 2 | Experimental reflectivity spectrum and accidental Dirac 
dispersion. a, SEM images of the photonic crystal samples: side view (upper 
panel) and top view (lower panel). b, Schematic drawing of the measurement 
set-up. Linearly polarized light from a super-continuum source is reflected 
off the photonic crystal slab (‘sample’) immersed in an optical liquid, and 
collected by a spectrometer (SP). The incident angle 0 is controlled using a 
precision rotating stage. BS, beam splitter. c, Reflectivity spectrum of the sample 
measured experimentally (upper panel) and calculated numerically (lower 
panel) along the ! — X and the = M directions. The peak location 

of reflectivity reveals the Hermitian part of the system, which forms Dirac 
dispersion due to accidental degeneracy. In the lower panel, white solid lines 


Supplementary Fig. 5). Note that the reflection peaks do not follow the 
real part of the complex eigenvalues of the Hamiltonian; in fact they 
follow the eigenvalues of the Hermitian part of the Hamiltonian, even 
though the Hamiltonian is non-Hermitian. To understand this, we 
consider a more general two-by-two Hamiltonian of a coupled res- 
onance system H and separate it into a Hermitian part A and an anti- 
Hermitian part —iB (A and B are both Hermitian) 


My, K f¥. Vy eigenvalues Oxy 0 
—i 2 (3) 
K 2 Y2 2 0 we 
a eee We ee” 


A iB 


H= 


As before, we use w+ to denote the complex eigenvalues of the 
Hamiltonian A — iB. Physically, matrix A describes a lossless system, 
and matrix —iB adds the effects of loss. In B, the diagonal elements are 
loss rates (in our system, they come primarily from radiation), and the 
off-diagonal elements arise from overlap of the two radiation patterns, 
also known as external coupling of resonances via the continuum. 
Modelling the reflectivity using temporal coupled-mode theory 
(TCMT), we show that when matrix B is dominated by radiation, 
the reflection peaks occur near the eigenvalues 2, of the Hermitian 
part A and are independent of the anti-Hermitian part —iB (see 
Supplementary Information section VII and Supplementary Fig. 6 
for details). Therefore, the linear Dirac dispersion observed in the 
measured data of Fig. 2c (dark red) indicates that we have successfully 
achieved accidental degeneracy in the eigenvalues of the Hermitian 
part, consistent with the simplified model in equation (1). In 
Supplementary Fig. 8b, we plot the values of 2,2 extracted from the 
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indicate the real part of the eigenvalues; spectra and eigenvalues at three 
representative angles (marked by dashed lines and circles) are shown in 

d. d, Three line cuts of reflectivity R from simulation results. Also shown are the 
complex eigenvalues (open circles) calculated numerically. At large angles 
(0.8°), the two resonances are far apart, so the reflectivity peaks (red arrows) are 
close to the actual positions of the complex eigenvalues. However, at small 
angles (0.3°, 0.1°), the coupling between resonances cause the resonance peaks 
(red arrows) to have much greater separations in frequency compared to the 
complex eigenvalues. The black arrows mark the dips in reflectivity that 
correspond to the coupled-resonator induced transparency (CRIT, see text 
for details). 


reflectivity data through a more rigorous data analysis using TCMT 
(described below); the linear dispersion is indeed observed. We note 
that when there is substantial non-radiative loss or material gain in the 
system, the reflection peaks no longer follow the eigenvalues of the 
Hermitian part (see Supplementary Information section VIII and 
Supplementary Fig. 7). 

The real part of the complex eigenvalues of the Hamiltonian, 
Re(w+), behave very differently from the reflectivity peaks. 
Simulation results (solid white lines in the lower panel of Fig. 2c) show 
Re(q..) is dispersionless at small angles with a branch-point singular- 
ity around 0.31°—consistent with the feature predicted by the simpli- 
fied Hamiltonian in equation (2). In Fig. 2d, we compare the 
reflectivity spectra from simulations (with peaks indicated by red 
arrows) with the corresponding complex eigenvalues at three repres- 
entative angles (0.8° in blue, 0.31° in green and 0.1° in magenta). At 
0.31°, the two complex eigenvalues are degenerate, indicating an EP; 
however, the two reflection peaks do not coincide since they represent 
the eigenvalues of only the Hermitian part of the Hamiltonian, which 
does not have degeneracy here. The dip in reflectivity between the two 
peaks (marked as black arrows in Figs 2 and 3) is the coupled-res- 
onator-induced transparency (CRIT) that arises from the interference 
between radiation of the two resonances”®, similar to electromagnet- 
ically induced transparency. 

Qualitatively, the peak locations of the measured reflectivity spec- 
trum reveal the eigenvalues of the Hermitian part, A, and the line- 
widths of the peaks reveal the anti- Hermitian part, —iB; diagonalizing 
A — iB yields the eigenvalues w:., as illustrated in equation (3). To 
be more quantitative, we use TCMT and account for both the direct 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


a Experiments -- CMT  b 
M <——_ | —~>X M <——_ T —>X 
1A 0 =0.8° 0.6 -——1—-——1— 7560 ——— ———0 
= Q FR 7 4 A 
N aa 
x 05 3S 0.598 &, ” 1562 s b \ OP as 
oe == 3 be ‘I da 2 of fon 2 
2 ie 2 xe } SB bore yoda 
sy fe) . L % = @ Oo 5e- OL a 
SX 5 © Rew) =, 0.596 aller “4564 3 Peo%o0 76 = 
£ $ (2nc/a) L L£ Oo = P gq 15 a 
a § é OK se, \ | x 
ad 3 0.594- Slee S ‘ - = 
c /f qree = Fx ee x, 7? 3 
A é k 4 —~— 4 2, 
1 CRIT 9-03 osgot—_1 a | poi 1 ios 
a | = 1 05 0 05 1 1 05 0 05 1 
= ; Angle (degrees) Angle (degrees) 
2 0 
=n 
= is =2 ©” Near EP Re() © 
= > T7C/a) ‘ 
& —— — fo 
N 
7 S be ' | L | 
1A | ‘ x 4 4 
0=0.1 | a) \ae, aL Ne 
wa & eo —~— ooy & 200-024 6 a 
x 05 a b \ J C Ta: i 
©, {o.se25 oso14 eS 1 | 
=n 9 S > = -2 g 4 2b 4 
=x 1» ° Rew) 4 rx 3 r—>M 
Es (2nc/a) — 14] 
2 0.592 0.6 0.592 0.6 
x Re(w)a/2nc Re(w)a/2nc 


Figure 3 | Experimental demonstration of an exceptional ring. a, Examples 
of reflection spectrum R from the sample at three different angles (0.8° blue, 
0.3° green and 0.1° magenta, solid lines) measured with s-polarized light along 
the I’ X direction (same setup as in numerical simulations shown in Fig. 2d), 
fitted with the TCMT expression (equation (S20) in Supplementary 
Information) (black dashed lines). At each angle, the positions of the complex 
eigenvalues extracted experimentally are shown as open circles. b, Complex 
eigenvalues extracted experimentally (symbols), with comparison to numerical 
simulation results (dashed lines) for both the real part (left panel) and the 


(non-resonant) and the resonant reflection processes including nearby 
resonances; the expression for reflectivity is given in Supplementary 
Information equation (S20), with the full derivation given in 
Supplementary Information section IX. Fitting the reflectivity curves 
with the TCMT expression gives us an accurate estimate of the matrix 
elements and the eigenvalues; this procedure is the same as our 
approach in ref. 27 except that here we additionally account for the 
coupling between resonances”*. Figure 3a compares the fitted and the 
measured reflectivity curves at three representative angles (with more 
comparison in Supplementary Fig. 8a); the excellent agreement shows 
the validity of the TCMT model. Underneath the reflectivity curves, we 
show the complex eigenvalues. The difference between numerically 
calculated reflectivity (Fig. 2d) and experimental results (Fig. 3a) stems 
from the non-radiative decay channels in our system, mostly due to 
scattering loss from the surface roughness”. 

Repeating the fitting procedure for the reflectivity spectrum mea- 
sured at different angles, we obtain the dispersion curves for all com- 
plex eigenvalues, which are plotted in Fig. 3b. Along both directions in 
k space (> X and [> M), the two bands of interest (shown in 
blue and red) exhibit the EP behaviour predicted in equation (2): for 
|k| < k, the real parts are degenerate and dispersionless; for |k| > k, the 
imaginary parts are degenerate and dispersionless; for |k| in the vicin- 
ity of k, branching features are observed in the real or imaginary part. 
In Fig. 3c, we plot the eigenvalues on the complex plane for both the 
TI — X and I — M directions. We can see that in both directions, the 
two eigenvalues approach each other and become very close at a cer- 
tain k point, which is a clear signature of the system being very close to 
an EP. 

We have shown that non-Hermiticity arising from radiation can 
significantly alter fundamental properties of the system, including the 
band structures and the density of states; this effect becomes most 


imaginary part (right panel). Red squares and dashed lines are used for the band 
with zero radiation loss at the I point, blue circles and dashed lines for the 
band with finite radiation loss at the I point, and grey crosses and dashed lines 
for the third band decoupled from the previous two owing to symmetry. 

The orange shaded regions correspond to the inside of the ring. ¢, Positions of 
the eigenvalues (red and blue dashed lines) approach and become very close to 
each other (indicated by the two brown arrows), demonstrating near-EP 
features in different directions in the momentum space and the existence of 
an exceptional ring. 


prominent near EPs. The photonic crystal slab described here provides 
a simple-to-realize platform for studying the influence of EPs on light- 
matter interaction, such as for single particle detection?’ and modu- 
lation of quantum noise. The two-dimensional flat band can also 
provide a high density of states and therefore high Purcell factors. 
The strong dispersion of loss in the vicinity of the [ point can improve 
the performance of large-area single-mode photonic crystal lasers”. 
The deformation into an exceptional ring is a general phenomenon 
that can also be achieved with material gain or loss and for Dirac points 
in other lattices’””°. Further studies could advance the understanding 
of the connection between the topological property of Dirac points*” 
and that of EPs* in general non-Hermitian wave systems, and our 
method could go beyond photons to phonons, electrons and atoms. 
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Inhomogeneity of charge-density-wave order and 
quenched disorder in a high-T, superconductor 


G. Campi'**, A. Bianconi**, N. Poccia”, G. Bianconi*, L. Barba”, G. Arrighetti®, D. Innocenti*®, J. Karpinski®’, N. D. Zhigadlo’, 


S. M. Kazakov”’®, M. Burghammer””°, M. v. Zimmermann", M. Sprung" & A. Ricci 


It has recently been established that the high-transition-temper- 
ature (high-T.) superconducting state coexists with short-range 
charge-density-wave order’""' and quenched disorder’*” arising 
from dopants and strain’*’’. This complex, multiscale phase 
separation'*”! invites the development of theories of high-temper- 
ature superconductivity that include complexity’””’. The nature of 
the spatial interplay between charge and dopant order that pro- 
vides a basis for nanoscale phase separation remains a key open 
question, because experiments have yet to probe the unknown 
spatial distribution at both the nanoscale and mesoscale (between 
atomic and macroscopic scale). Here we report micro X-ray dif- 
fraction imaging of the spatial distribution of both short-range 
charge-density-wave ‘puddles’ (domains with only a few wave- 
lengths) and quenched disorder in HgBa,CuO, , ,, the single-layer 
cuprate with the highest T., 95 kelvin (refs 26-28). We found that 
the charge-density-wave puddles, like the steam bubbles in boiling 
water, have a fat-tailed size distribution that is typical of self- 
organization near a critical point’’. However, the quenched dis- 
order, which arises from oxygen interstitials, has a distribution 
that is contrary to the usually assumed random, uncorrelated dis- 
tribution’”’’. The interstitial-oxygen-rich domains are spatially 
anticorrelated with the charge-density-wave domains, because 
higher doping does not favour the stripy charge-density-wave pud- 
dles, leading to a complex emergent geometry of the spatial land- 
scape for superconductivity. 

Although it is known that the incommensurate charge-density- 
wave (CDW) order in cuprates (copper oxides) is made of ordered, 
stripy, nanoscale ‘puddles’ with an average of only 3-4 oscillations, 
information about the size distribution and spatial organization of 
these puddles has so far not been available. We present experiments 
that demonstrate that CDW puddles have a complex spatial distribution 
and coexist with, but are spatially anticorrelated to, quenched disorder 
in HgBa,CuO, ; , (Hg1201). The sample we studied is a layered per- 
ovskite at optimum doping with oxygen interstitials (y = 0.12), 
tetragonal symmetry (P4/mmm) and a low misfit strain'**. The 
X-ray diffraction (XRD) measurements (see Methods) show diffuse 
CDW satellites (secondary peaks surrounding a main peak) at 
Icpw = (0.23a*, 0.16c*) in the b* = 0 plane and qcpw = (0.23b*, 
0.16c*) in the a* = 0 plane (where a*, b* and c* are the reciprocal 
lattice units) around specific Bragg peaks, such as (108), below the 
onset temperature Tcpw = 240 K (see Fig. 1a). The component of the 
momentum transfer gcpw in the CuO) plane (0.23a*) in this case is 
smaller than it is in the underdoped case (0.28a*)*. The temperature 
evolution of the CDW-peak profile along a* (in the h direction; 
Fig. 1b) shows a smeared, glassy-like evolution for temperatures 
below Tcpw. The CDW-peak intensity reaches a maximum at 
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T = 100 K, followed by a drop associated with the onset of supercon- 
ductivity at T= T.. We investigated the isotropic character of the 
CDW in the a-b plane using azimuthal scans, as shown in Fig. Ic. 
We observed an equal probability of vertically and horizontally 
striped CDW puddles. 

Our main result is the discovery of the statistical spatial distribution 
of the CDW-puddle size and density throughout the sample, which 
shows an emergent complex network geometry for the superconduct- 
ing phase. We performed scanning micro X-ray diffraction (SuXRD) 
measurements (see Methods) to extend the imaging of spatial inhomo- 
geneity previously obtained by scanning tunnelling microscopy 
(STM)’° from the surface to the bulk of the sample and from nanos- 
cale to mesoscale spatial inhomogeneity. Clear evidence of the 
inhomogeneous spatial distribution of the CDW is provided by the 
observation of very different CDW-peak profiles collected at different 
illuminated sample spots (see Fig. 1d) corresponding to spots with 
‘large’ and ‘small’ puddles. 

We investigated the temperature dependence of CDW domains by 
recording the CDW-peak intensity and its full-width at half-maximum 
(FWHM) during cooling from 280 K to 85 K. We collected the data in 
two different places on the sample corresponding to ‘large’ and ‘small’ 
CDW puddles. Figure le, f shows the temperature dependence of 
population (intensity), the number of oscillations hopw/Ahcpw 
(where hcpw and Ahcpw are the position and the FWHM of the 
CDW peak profile in units of a*, respectively) and in-plane puddle 
size €, (along the a axis) in large (red filled circles) and small (black 
filled squares) CDW puddles. The broad phase transition appears to be 
arrested, as indicated by the size of the CDW puddles €, = 1/Ahcpw, 
which does not diverge below Tcpw. This behaviour is typical of low- 
dimensional systems with quenched disorder. A map representing the 
spatial organization of the CDW-puddle size is shown in Fig. 1g. The 
probability density function (PDF) of the in-plane CDW-puddle size 
€, is shown in Fig. 1h. 

The PDF has a long fat tail that extends over an order of 
magnitude, and is fitted by PDF(€,)~¢, “ exp(—€,/€,), where 
A&cpw = 2.8 + 0.1 is the critical exponent of the puddle-size power- 
law distribution and €, > 40 nm. Although we can determine that the 
average size of CDW puddles is 4.3 nm (in agreement with previous 
work), PDF(¢,) has a non-Gaussian shape and rare, larger puddles 
reaching sizes of 40 nm are detected. Our finding of a fat-tailed distri- 
bution for the CDW-puddle size is in agreement with previous results 
obtained by STM”. Such structures, where spontaneous breaking of 
both translational symmetry (CDW electronic crystalline phase) and 
gauge symmetry (superconductivity) coexist, have been called super- 
stripes’’. The distribution of the CDW puddles we have found intro- 
duces a substantial topological change to the available space for 
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Figure 1 | Temperature dependence and spatial distribution of CDW 
puddles in Hg1201. a, The CDW satellite near the (108) Bragg peak appears 
below 240 K. b, Temperature dependence of CDW-peak profiles along h. The 
CDW-peak intensity Icpw is measured as the number of counts minus 

the background. ¢, The qcpw = (0.23a*(b*),0.16c*) peak profile at different 
azimuthal angles « showing the peak isotropy. d, Two typical CDW peaks 
collected at two different places in the same crystal. Red solid circles correspond 
to the diffraction profile from an illuminated part the sample with large 
CDW puddles (red in g); black filled squares correspond to an illuminated part 
of the sample with small CDW puddles (blue in g); and the solid lines are 


superconductivity: the current running from a point A to a point B 
of the material can take different paths (see Fig. 1i) that are not 
topologically equivalent”’ thus forming an emergent complex hyper- 
bolic geometry”. 

To investigate the interplay between the CDW puddles and the 
quenched disorder, we studied the spatial distribution of oxygen 
defects. The quenched lattice disorder is due to oxygen interstitials 
(O;), which form O; atomic stripes in the HgO, layers, in agree- 
ment with previous experiments”””*. HgBa,CuO, + y (ref. 28), like 
YBazCuzO¢ +, (ref. 15) and La,CuO, + , (refs 14, 16), shows T- varia- 
tions, owing to the effect of the spatial organization of O; on 
superconductivity. The average Oj self-organization was detected by 
high-energy XRD (see Methods). Figure 2a shows the (0<h<5, 
0<k<A) portion of reciprocal space, where there is strong evidence 
of diffuse streaks running along the a* and b* directions and crossing 
all the Bragg peaks. Our high-energy XRD data confirm the formation 
of Oj stripes intercalated between the CuO, planes, both in the (100) 
and (010) directions”. 

The spatial distribution of the intensity of the streaks was obtained 
by SUXRD (see Methods). We measured the reciprocal a*-c* plane 
(or b*-c* plane) around the (006) Bragg peak in reflection geometry. 
The O, stripes in Hg1201 run along the a* (b*) direction with no 
correlation along the c* direction; therefore, they also lead to streaks 


360 | NATURE | VOL 525 | 17 SEPTEMBER 2015 


©: ( 
© (>) 


Gaussian fits. e, The CDW-peak intensity as a function of temperature, at the 
two different places on the sample corresponding to large (red filled circles, 
right axis) and small (black filled squares, left axis) CDW puddles. The dashed 
line corresponds to T = T, and the dotted line to T= Tcpw. f, Evolution of the 
number of CDW oscillations (4cpw/Ahcpw) inside a CDW puddle and the 
CDW domain size along the a axis (€,). g, h, Spatial map (g) and probability 
density function (h) of the CDW-puddle size. Scale bar in g, 10 jim. i, A 
schematic of non-equivalent paths, running in the interface space between 
CDW puddles, connecting point A to point B in the emergent complex non 
Euclidean spatial geometry””° for the superconducting current. 


102 


on the a*-c* plane. A schematic of O; atomic stripes is shown in 
Fig. 2b. In Fig. 2c we show the spatial map of the streak intensity. 
The picture shows rich (bright yellow) and poor (dark black) regions 
of O; stripes. The PDF of Oj-rich regions in Fig. 2d is fitted by 
PDF(I) ~ (I/Ip) “° exp(I/I,), where Ip is the average intensity, 
do, =2.0 + 0.1 is the critical exponent and I, > 20. 

In Fig. 3 we present results on the spatial interplay between CDW- 
rich regions and Oj-rich regions. We calculated the ‘difference map’ 
(see Methods) between CDW peaks and O; diffuse streaks. The poor 
CDW regions on the CuO, basal plane correspond to Oj-rich regions 
on the HgO, layers, as illustrated in Fig. 3a. The CDW puddles and 
Oj-rich regions give rise to the positive and negative peaks, respect- 
ively, in the surface plot shown in Fig. 3b. The spatial anticorrelation is 
evident from the scatter plot of O; intensity versus CDW intensity 
(Fig. 3c). As O; intensity increases, the CDW intensity decreases, 
and vice versa. This is consistent with the fact that excess O; means 
higher doping, and high doping does not favour stripy, underdoped 
short-range CDW order. Figure 3d shows the two maps obtained via 
the segmentation of the difference map, and provides a direct image of 
how doping-poor (CDW-rich regions are shown in red) and doping- 
rich (Oj-rich regions are shown in blue) phases are arranged in 
different regions of the material. Figure 3e illustrates the nanoscale 
configuration of CDW-puddles (red spots) in the CuO, plane using 
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Figure 2 | Correlated quenched disorder due to O; atomic stripes in 
Hg1201. a, A portion of the h-k diffraction pattern. Resolution-limited streaks 
connect the Bragg peaks, owing to the formation of O; stripes in the HgO,, 
spacer layers. b, Schematic representation of the atomic O; stripes. c, SuXRD 


the experimental distribution of CDW size; this distribution generates 
‘holes’ in the space available for the free electrons (light blue area). This 
space is topologically interesting: there are an infinite number of ways 
for a current path to connect a point A to a point B around the CDW 
puddles, which are not only distinguished by the number of times a 
path goes around a single hole, but also by the way the path passes 
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map of a region of a showing the relative O; streak intensity Jo,. The bright 
(dark) spots correspond to sample regions with a high (low) density of O; 
atomic stripes, called O;-rich (poor) regions. Scale bar, 10 tum. d, Probability 
density function calculated from the O;-streaks intensity map. 


though the pattern of CDW puddles’’”*. The complex space that 
emerges from the mesoscopic phase separation, both in the spacer 
layers and in the CuQ, plane, substantially changes (1) the dielectric 
constant that controls the long-range Coulomb interaction that is 
relevant for phase separation near a Lifshitz transition”, (2) the dielec- 
tric constant that is relevant to electron-electron interaction in the 
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Figure 3 | Spatial anticorrelation between CDW-rich and O;-rich regions. 
a, The CDW-rich regions (red) on the CuO; planes and O;-rich regions (blue) 
on the HgO, layers. b, Surface plot of the difference map (see Methods) between 
the CDW-peak and O;-streak intensity. The positive (green to red) values 
indicate the CDW-rich regions and the negative (green to blue) values 
correspond to O;-rich regions. Scale bar, 5 tum. ¢, Scatter plot of O; versus CDW 
intensity demonstrating the negative correlation between CDW-puddle and 


O,-stripe populations. d, Segmentations of the difference map in b highlighting 
the network of CDW-rich domains (left panel) and O;-rich regions (right 
panel). Scale bar, 10 um. e, A schematic of the nanoscale texture formed by 
CDW-rich regions (red spots) and the ‘charge-O;-rich’ region (light blue area), 
which define an interface space and loci of the superconductivity with a 
complex non-Euclidean geometry”. 
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pairing and (3) the geometrical and topological properties of the space 
that is available for the overall phase coherence of the macroscopic 
quantum condensate that is made up of multiple condensates at the 
nanoscale with a single critical temperature’. 

This work offers new insight into the complexity of nanoscale 
phase-separation phenomena in high-temperature superconductors. 
More generally, our results deal with the effects of quenched disorder 
in phase transitions. A phase transition that would be first order in the 
clean limit gets smeared into a continuous-looking transition in the 
presence of a random, Gaussian distributed, quenched disordered 
background’*”’. Here the disorder itself is not randomly distributed, 
but has a long-tailed probability density function, leading to correlated 
disorder. Even in the ‘ideal’ single-layer cuprate superconductor 
HgBazCuO,+, at optimum doping (T.= 95K), the CDW order 
self-organizes into puddles, forming an inhomogeneous landscape 
with an emergent complex network geometry. Our results provide 
further evidence for the universality of mesoscale phase separation 
even in the most optimized superconducting cuprates, which implies 
that the superconductivity will be non-uniform throughout what is a 
granular medium. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
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METHODS 


Sample preparation and characterization. The HgBa,CuO, , , (Hg1201) crystal 
with y=0.12, grown at ETH™, has a sharp superconducting transition at 
T, = 95K. The crystal structure has P4/mmm symmetry with lattice parameters 
a= b = 0.387480(5) nm and c = 0.95078(2) nm at T= 100 K (numbers in par- 
entheses indicate the standard deviation of the last digit). 

XRD measurements using the XRD1 beamline. To identify the CDW order ina 
single Hg1201 crystal we used XRD using the XRD1 beamline at the Elettra 
synchrotron radiation facility in Trieste, Italy, tuning the photon energy between 
13 keV and 16keV with a beam size of 200 X 200 um’. Only selected reflections 
show clear CDW satellites, in agreement with ref. 2. We focused on the CDW 
satellite located at qcpw = (0.23, 0, 0.16) around the (108) Bragg reflection, which 
appeared as the sample was cooled below 240K. Typical diffraction patterns 
collected at 85 K, 105 K and 280 K are shown in Fig. la. To get a direct view of 
the temperature dependence of the CDW-satellite reflection for T = 280-85 K, a 
two-dimensional colour plot of the CDW-peak profile along the a* direction as a 
function of temperature is shown in Fig. 1b. 

High-energy XRD measurements using the BW5 beamline. High-energy XRD 
measurements were collected using the BW5 beamline at DESY, Hamburg, 
Germany, using a transmission geometry and an X-ray energy of 100KeV. 
A single SiGe(111) gradient monochromator was used. The beam size was 
200 um X 200 um. We used a vertical rotation axis, and the c axis of the single 
crystal was oriented parallel to the direction of the incoming X-ray beam. 
The diffraction patterns were collected by an area detector in the temperature 
range 20-300 K. In this geometry, we can probe the lattice fluctuations on the 
a-b plane. The possible CDW-peak anisotropy was seen using azimuthal scans 
(0° << 90°), as shown in the schematic in Fig. 1c. The CDW-peak amplitude 
and FWHM do not change substantially as a function of ~, as shown in the colour 
plot of the diffraction profile (as a function of «) in Fig. 1c; instead, this plot shows 
the CDW planar isotropy, in agreement with the tetragonal P4/mmm symmetry of 
the lattice. The presence of resolution-limited streaks connecting the Bragg peaks, 
owing to the organization of single O; stripes in the mercury spacer layer, are 
shown in Fig. 2a. This figure shows a portion of the h-k diffraction pattern that was 
collected at DESY. The spatial distribution of the O; atomic stripes in the HgO, 
spacer layers, which cause one-dimensional doping and lattice spatial inhomo- 
geneity, does not vary with temperature below 250 K, leading to quenched disorder 
at the onset of the charge-density phase. 

SpXRD measurements using the ID13 beamline. SXRD experiments were 
performed in reflection geometry using the ID13 beamline at ESRF, Grenoble, 
France. We applied incident X-ray energy of 13 KeV. By moving the sample under 
a 1-4t1m focused beam with an x-y translator, we scanned a sample area of 
65 X 80 um?, collecting 5,200 different diffraction patterns at T= 100K. For each 
scanned point of the sample, the qcpw-peak profile was extracted; the FWHMs 
along the a*(b*) and c* directions were evaluated to obtain the domain size of the 
charge-ordered regions along the a(b) and c crystallographic axes. 
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Two quite different CDW-peak profiles in the same crystal measured along the 
a*(b*) direction in the a*-c*(b*-c*) plane are shown in Fig. 1d. Here we show two 
typical profiles collected at two different spatial locations in the same crystal 
corresponding to large (red circles) and small (black squares) puddles. The con- 
tinuous lines are the Gaussian fits to the data. The different amplitudes and 
FWHMs ((0.033 + 0.001)a* and (0.089 + 0.001)a* in the upper and lower panels 
of Fig. 1d, respectively, errors indicate standard deviation) of the two peaks, which 
correspond to large and small CDW puddles, provide evidence of a strong 
inhomogeneity in the CDW spatial distribution. The peak profiles do appear 
the same along the a* and b* directions, confirming the peak isotropy in the basal 
plane of the tetragonal lattice. The intensity of the CDW satellites as a function of 
temperature, measured at two different locations on the sample corresponding to 
large (red) and small (black) CDW puddles are shown in Fig. le. The vertical lines 
represent the superconducting temperature T. and the CDW onset temperature 
Tcpw- The order-disorder transition is very broad, which indicates the role of the 
quenched disorder owing to the presence of defects. Moreover, the CDW intensity 
shows a clear drop around T, that appears to depend on the CDW puddle size. The 
temperature dependence of the number of CDW oscillations inside a single puddle 
(hcpw/Ahcpw) and the domain size of a single puddle along the a(b) axis (€,) are 
shown in Fig. 1f. (Akcpw and hcpw are the FWHM and the location along a* of 
the CDW peak; the domain size along the a axis (b axis) is given by the correlation 
length €,.) The inhomogeneity of the CDW distribution is depicted in the 65 x 80- 
jim* XRD map of the (nanoscale) size of CDW domains in Fig. 1g. This figure 
shows loci of large (red-yellow area) and small (blue area) CDW puddles. The 
scale bar corresponds to 10 tm. Using the ID13 microfocus beamline at ESRF, we 
can also detect the spatial distribution of the quenched disorder. Figure 2c shows 
the SuXRD map of the integrated intensity of the streaks of O; stripes. The bright 
(dark) spots correspond to sample regions with a high (low) density of O; atomic 
stripes, called O;-rich (poor) regions. The scale bar is 10 jm. Figure 2d shows the 
PDF of the O;,-streak intensity that was obtained from the SXRD map. This plot 
shows the probability distribution of the O;-rich regions. The experimental set-up 
allows us to investigate the spatial interplay between CDW puddles in the CuO, 
plane and O;-rich domains in the HgO, layers, shown in Fig. 3. We measured the 
‘difference map’ (Icpw —Io,), where (Icpw) and (Io,) are the intensities of the 
dcpw peak and the O;, diffuse streaks, respectively, normalized to [0, 1]. The 
surface plot of this difference map is shown in Fig. 3b. The positive (green to 
red) peaks indicate CDW-puddle-rich regions and the negative (green to blue) 
peaks indicate O;-rich regions. The spatial anticorrelation between CDW puddles 
and O; atomic stripes is obtained by segmentation of the difference map. We use 
this segmentation to visualize the phase separation owing to the network of CDW- 
rich domains, which correspond to ‘charge poor’ domains in the CuO, planes (left 
panel of Fig. 3d), and O;-rich regions in the HgO, layers, which correspond to 
‘charge rich’ portions of the CuO, plane (right panel of Fig. 3d). 

Code availability. The code we used for statistical analysis of the SuXRD data is 
not currently available (G.C., A.R. and A.B., manuscript in preparation). 
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Designing switchable polarization and 
magnetization at room temperature in an oxide 
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Ferroelectric and ferromagnetic materials exhibit long-range order 
of atomic-scale electric or magnetic dipoles that can be switched by 
applying an appropriate electric or magnetic field, respectively. 
Both switching phenomena form the basis of non-volatile random 
access memory’, but in the ferroelectric case, this involves destruct- 
ive electrical reading and in the magnetic case, a high writing 
energy is required’. In principle, low-power and high-density 
information storage that combines fast electrical writing and mag- 
netic reading can be realized with magnetoelectric multiferroic 
materials’. These materials not only simultaneously display ferro- 
electricity and ferromagnetism, but also enable magnetic moments 
to be induced by an external electric field, or electric polarization 
by a magnetic field*°. However, synthesizing bulk materials with 
both long-range orders at room temperature in a single crystalline 
structure is challenging because conventional ferroelectricity 
requires closed-shell d° or s* cations, whereas ferromagnetic order 
requires open-shell d” configurations with unpaired electrons®. 
These opposing requirements pose considerable difficulties for 
atomic-scale design strategies such as magnetic ion substitution 
into ferroelectrics”*. One material that exhibits both ferroelectric 
and magnetic order is BiFeO3, but its cycloidal magnetic structure? 
precludes bulk magnetization and linear magnetoelectric coup- 
ling”. A solid solution of a ferroelectric and a spin-glass perovskite 
combines switchable polarization’' with glassy magnetization, 
although it lacks long-range magnetic order’. Crystal engineering 
of a layered perovskite has recently resulted in room-temperature 
polar ferromagnets’’, but the electrical polarization has not been 
switchable. Here we combine ferroelectricity and ferromagnetism 
at room temperature in a bulk perovskite oxide, by constructing a 
percolating network of magnetic ions with strong superexchange 
interactions within a structural scaffold exhibiting polar lattice 
symmetries at a morphotropic phase boundary’* (the composi- 
tional boundary between two polar phases with different polariza- 
tion directions, exemplified by the PbZrO3;-PbTiO; system) that 
both enhances polarization switching and permits canting of the 
ordered magnetic moments. We expect this strategy to allow the 
generation of a range of tunable multiferroic materials. 

Several approaches to room-temperature multiferroicity have been 
explored. Composite multiferroics, which are multiphase mixtures 
of magnetic and ferroelectric materials, have displayed the largest 
magnetoelectric effects, originating from stress-mediated coupling”. 
The indirect nature of the cross-coupling between the polar and 
magnetic phases hinders complete switching of the ferroic properties 
through magnetoelectric coupling. The single-phase oxide BiFeO; is 
antiferromagnetically ordered with competing exchange interactions 
producing a cycloidal structure with a period of 62 nm (ref. 9). 
Two approaches have been used to disrupt this cycloid. First, solid 
solutions of the non-polar, weakly ferromagnetic LnFeO; (Ln = Sm, 
Dy, La) ferrites in BiFeO3 have a finite magnetization at room 
temperature’®”’ in a fully ordered magnetic network. The inherent 
trade-off between the soft magnetic properties of the orthoferrite and 


the ferroelectric properties of BiFeO; leads to intermediate com- 
positions for which the long-range crystallographic symmetry (polar 
versus non-polar)'*, the magnetic ground state and switchability”” 
are subject to debate. Second, strained and nanostructured BiFeO; 
films have shown remanent magnetization”, and electrical control 
of the staggered magnetization in BiFeO3 can switch the magne- 
tization of a coupled ferromagnetic material in a thin-film-device 
structure’. 

In BiFeOs, the ferroelectric polarization is aligned along the [111], 
direction of the primitive cubic (indicated by the subscript p) ABO; 
perovskite subcell. The morphotropic phase boundary (MPB) between 
two non-cubic, polar crystallographic symmetries of the ABO; per- 
ovskite with distinct polarization directions is a route to large, switch- 
able polarization via polarization rotation or reorientation’*. The 
structure at the MPB is a single perovskite network with a complex 
domain microstructure”, where the Bragg scattering can be modelled 
in single- or multiple-phase approximations”’. We have recently pro- 
duced a new MPB in a solid solution between rhombohedral (R, space 
group R3c, Fig. 1a) [111], and orthorhombic (O, space group Pna2, 
Fig. 1b) [001], polarization directions in the Bi>*-based perovskites 
(1 — x)BiTig — yy2FeyMgq — yy203-(x)CaTiO3: the MPB occurs for 
0.075 =x<0.175, y=0.25 (Fig. 1c)**. This new MPB affords 
large switchable polarizations (P) in bulk materials, for example, 
P=49uCcm ” for x = 0.15, y = 0.25. These materials were designed 
to have high d° Ti** and Mg*" cation content on the octahedral B site 
to minimize dielectric loss and aid ferroelectric switching by sustaining 
the required electric field. Because the MPB structure is based on a 
continuous ABO; network, there is a coherent magnetic B-site sublattice 
that is connected by B-O-B superexchange pathways throughout each 
crystallite. The low Fe content of 21.25% at x = 0.15, y = 0.25 is below the 
percolation threshold for the primitive cubic lattice (Fig. le): because 
magnetic order in insulators arises from nearest-neighbour superex- 
change, such a material cannot display long-range magnetic order, and 
none is observed (demonstrated by the monotonic temperature depend- 
ence of the field-cooled (FC) and zero-field-cooled (ZFC) magnetizations 
and the linear magnetization (M(H), where H is the magnetic field) 
isotherm at 100 K, Extended Data Fig. 1). 

We therefore explored increasing the Fe content ((1 — x) X y) on 
the B site to generate long-range magnetic order within an MPB sys- 
tem that displays switchable polarization (Fig. 1d, f). A series of com- 
positions in the range x = 0.15, 0.60 = y = 0.90 were prepared and the 
perovskite phase purity was confirmed by powder X-ray diffraction 
(PXRD; Extended Data Fig. 2). The compositions x = 0.15, y = 0.60 
and x= 0.15, y= 0.80 were selected for detailed property studies. 
Pawley refinements on these compositions show that a model with 
both R and O phases, and a single-phase monoclinic model in a 
V2ap x 2p X V2ap unit cell (space group P11a, which is a polar sub- 
group of both R3c and Pna2,; refined lattice parameters shown in 
Extended Data Table 1), produce superior fits to those obtained by 
a purely rhombohedral model (Extended Data Fig. 3 and Extended 
Data Table 1). This result demonstrates that these compositions exist 
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Figure 1 | Crystal structure, magnetic percolation and the morphotropic 
phase boundary (MPB) in (1 — x)BiTiq — y/2FeMgq — y/203-(x)CaTiO; 
where 0 = x= 0.35 and 0.25 = y=0.90. a, Schematic diagram of the purely 
rhombohedral (R3c) structure where x = 0.05, y = 0.25, represented in the 
cubic perovskite subcell with polar displacement of Bi along the [111], axis. The 
blue, orange and red spheres indicate Bi/Ca, Fe/Ti/Mg and O respectively. 

b, The purely orthorhombic (Pna2,) structure where 0.175 = x = 0.35, y = 0.25 
with polar displacement of Bi along the [001], axis. c, The ferroelectric MPB 
observed for 0.075 = x < 0.175, y = 0.25, shown with superimposed polar 
displacements of Bi along the [111], and [001], axes. d, Long-range magnetic 
order at 300 K for x = 0.15, y = 0.80, with a proposed magnetic structure 
(orange arrows) based on a G-type antiferromagnetic arrangement and spins 
oriented perpendicular to the [111], polarization direction. e, f, Schematic 


in the MPB region towards the rhombohedral limit, and hence are 
long-range ordered, polar, non-cubic materials. 

The x = 0.15, y= 0.60 material has 51% Fe present on the B site, 
which is above the percolation threshold for long-range magnetic 
order, and has low dielectric loss despite the enhanced d-electron 
content (Extended Data Fig. 4a). It retains the polarization switching 
characteristics of the MPB and is a ferroelectric at room temperature 
with a maximum polarization (Pmax) of 47.1 uC cm” (Fig. 2a). 
Positive-up negative-down (PUND) measurements confirm the 
intrinsic nature of the measured polarization, with a remanent polar- 
ization of 41.5Ccm~*. The 300K Méssbauer spectrum, which 
probes all the Fe nuclei in the sample, is a sharp paramagnetic doublet 
(Fig. 2b), showing that the material is not magnetically ordered at 
room temperature. The good single paramagnetic component fit 
(hyperfine field Bys = 0) to the data excludes any magnetically ordered 
impurity phases with concentrations higher than 2 wt%. The isomer 
shift of 6 = 0.22(3) mms | (where the number in parentheses repre- 
sents the standard error) corresponds to Fe** ina homogeneous, 
distorted octahedral environment. The loop observed in the M(H) 
isotherm at 300 K, which has a small coercive field, is therefore assoc- 
iated with trace amounts (below diffraction detection limits) of Fe-rich 
ferrimagnetic impurities (for example, Fe;0, or MgFe2O, spinels; 
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diagrams of the MPB microstructure and nearest-neighbour magnetic 
exchange pathways for x = 0.15, y = 0.25 (e) and for x = 0.15, y = 0.80 

(f). Each square represents a perovskite unit cell (rhombohedral in purple and 
orthorhombic in green), brown dots are distributed randomly to represent unit 
cells containing Fe, and the associated brown lines represent magnetic 
exchange pathways. A percolating exchange pathway spanning the sample is 
absent in e but present in f. g-i, Pawley fits (red lines) to PXRD data (black 
circles) in the angular range 36.0° = 20 = 38.5° from composition x = 0.15, 

y = 0.80, modelled using a single R3c unit cell (g), superimposed R3c and Pna2, 
unit cells (h) anda single monoclinic P11a unit cell (i); see Extended Data Fig. 3 
for full patterns. Teal line, difference between the measured and fitted data; 
purple markers, hkl (R3c) reflections; green markers, hkl (Pna2,) reflections; 
magenta markers, hkl (P11a) reflections. 


Extended Data Fig. 5), and does not correspond to long-range order 
of the perovskite. Magnetic ordering in the perovskite below 350 K was 
probed by dc-SQUID (superconducting quantum interference device) 
ZFC and FC magnetization, and thermal remanent magnetization 
(TRM) measurements (Fig. 2c). The large divergence between the 
ZFC and FC data indicates the onset of weak ferromagnetism at the 
Néel temperature, Ty = 205 K, consistent with the Brillouin-like drop 
in the TRM. No other sign of magnetic ordering at lower temperature 
is observed (Extended Data Figure 6a), suggesting that, at the MPB, the 
perovskite behaves as a single-phase magnetic material; this suggestion 
is consistent with the sharpness of the Méssbauer spectrum. The M(H) 
isotherm at 10 K (Fig. 2d) can be decomposed into two components, a 
soft magnetic phase, which is associated with a trace amount (approxi- 
mately 0.6 wt%, consistent with the 300K measurement shown in 
Extended Data Fig. 5c) of the Fe-rich spinel ferrite impurity, and a 
harder phase with an open hysteresis loop and a linear high-field 
contribution, which are characteristic features of a weak ferromagnet 
(Extended Data Fig. 5a). This harder magnetic phase is attributed to 
the perovskite compound with an extracted coercive field of 376 mT 
and saturation magnetization of 0.013 jg per Fe, confirming that the 
material is a weak ferromagnet, where the magnetization arises from 
ferromagnetic canting of a predominantly antiferromagnetic magnetic 
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Figure 2 | Ferroelectric, magnetic and magnetoelectric properties of 
composition x= 0.15, y= 0.60. a, Polarization P versus applied electric field E 
at 300 K showing ferroelectric switching, measured at 10 Hz. Filled squares and 
circles represent the remanent polarizations from PUND measurements. Each 
colour represents the maximum applied electric field in the P(E)/PUND 
measurements; dotted lines are included as visual aids. b, Méssbauer spectrum 
measured at 300 K, with no applied magnetic field (black circles) and a single 
paramagnetic component fit (red line). c, dc magnetization measurement of TRM 
(blue line) and ZFC/FC (black/red lines) magnetization. The red dotted line 
indicates T = Ty. The FC and ZFC data only converge at the highest temperature, 
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because the small divergence above the Néel temperature arises from an impurity 
phase with an ordering temperature above the highest measured T. d, Isothermal 
magnetization M(H) at 10 K (black circles) and the sum of the perovskite and 
spinel impurity phase contributions (red line). e, Temperature dependence of the 
linear magnetoelectric susceptibility («). The data points (blue squares) are the 
mean values from 10 repeated measurements, with standard errors shown in red. 
The red dotted line is T = Ty. f, Induced ac magnetization (M,.) versus applied ac 
electric field amplitude (E,.) at 150 K (black squares) and 300 K (red circles). 
The data points are the mean values from 10 repeated measurements, with 
standard errors shown in red; the blue lines are linear fits to the data. 
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Figure 3 | Ferroelectric, magnetic and magnetoelectric properties of 
compositions x = 0.15, y= 0.80. a, Polarization P versus applied electric field 
E at 300 K showing ferroelectric switching, measured at 10 Hz. Filled squares 
and circles represent the remanent polarizations from PUND measurements. 
Each colour represents the maximum applied electric field in the P(E)/PUND 
measurements; dotted lines are included as visual aids. b, Méssbauer 

spectrum measured at 300 K, with no applied magnetic field (black circles) and the 
multicomponent fit (red line). Individual components 1-4 (green, blue, cyan 
and magenta, respectively) are described in the text and summarized in 
Extended Data Table 2. c, dc magnetization measurement of TRM (blue line) 


and ZFC/FC (black/red lines) magnetization. The red dotted line indicates 

T = Ty. d, Isothermal magnetization M(H) at 300 K (black circles) and the sum of 
the perovskite and minority phase spinel contributions (red line). e, Temperature 
dependence of the linear magnetoelectric susceptibility (~). The data points 
(blue squares) are the mean values from 10 repeated measurements, with standard 
errors shown in red. The red dotted line is T= Ty. f, Induced ac magnetization 
(M,c) versus applied ac electric field amplitude (E,.) at 300 K (red circles). 

The data points are the mean values from 10 repeated measurements, with 
standard errors shown in red; the blue line is a linear fit to the data. 
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structure. The structural symmetries present at the MPB all permit 
canting to occur within the G-type antiferromagnetic arrangement 
that is generally found for perovskite ferrites” (Fig. 1d). 

To confirm whether the two order parameters P and M are coupled, 
magnetoelectric measurements were performed on a disk that was 
poled both electrically and magnetically. The x = 0.15, y = 0.60 mater- 
ial displays linear magnetoelectric coupling (measured as the slope of 
the induced ac magnetization (M,.) versus the applied ac electric field 
amplitude (E,.)) only below the long-range ordering temperature of 
205 K (Fig. 2e, f). At 10 K, the material shows a pronounced magneto- 
electric susceptibility « (= gM,,./E,. where [Ug is the vacuum permeab- 
ility) of —1.11(1) ps m" (Extended Data Fig. 7; the number in 
parentheses represents the standard error), which changes sign upon 
warming to Ty (ref. 26). The residual 300 K magnetoelectric coupling 
is an order of magnitude smaller than that in the magnetically ordered 
state (Fig. 2f compares data below Ty at 150 K and above Ty at 300 K) 
and can be associated with composite effects*'* involving the magnetic 
minority phases that are not integrated into the complex MPB micro- 
structure of the ABO; perovskite network. 

The 51% Fe content at x = 0.15, y = 0.60 is sufficient to percolate and 
give long-range magnetic order, but the mean exchange field is too weak 
for room-temperature magnetization, because the effective number of 
nearest neighbours for superexchange is too low. The x = 0.15, y = 0.80 
composition gives 68% B-site occupancy by Fe**, Méssbauer spectro- 
scopy demonstrates that this increased coverage produces bulk mag- 
netic order at 300 K (Fig. 3b), in contrast to x = 0.15, y = 0.60, because 
there are no paramagnetic contributions to the spectrum (Extended 
Data Table 2). ZFC/FC magnetization and TRM measurements 
(Fig. 3c) show that Ty increases to 370 K. The majority (98.6(2)%) 
components 1 and 2 arise from the magnetically ordered MPB perovs- 
kite. Component | corresponds to Fe*” ina slightly distorted octahed- 
ral environment (6 = 0.29(5) mm s ’, electric quadrupole moment 
Q = 0.033(7) mms). The broader local field distribution and reduced 
hyperfine field in component 2 reflect the different local magnetic 
environments in a percolating system’’. The minority (1.3(2)%) com- 
ponents arise from spinel-derived Fe** (6 = 0.3mm s_'). There is no 
signature of further magnetic ordering at lower temperature in the 
TRM plot (Extended Data Fig. 6b). The 300 K (below Ty) M(H) iso- 
therm for the x = 0.15, y = 0.80 composition is similar to that observed 
for x = 0.15, y = 0.60 in the magnetically ordered state at 10 K: there are 
two components, a soft phase that is attributed to the high Fe content 
impurity (approximately 0.7 wt%), and a harder phase that is assigned 
to the perovskite with a coercive field (367 mT) and remanent magnet- 
ization (0.008 jy per Fe) consistent with bulk weak ferromagnetic 
behaviour (Fig. 3d and Extended Data Fig. 6b). The x= 0.15, 
y = 0.80 material is a ferroelectric at room temperature with a switch- 
able maximum polarization (Pinax) of 49.9 uC cm” (Fig. 3a): a ferro- 
electric polarization is still measurable at 473 K (Extended Data Fig. 8). 
The remanent polarization obtained from PUND measurement is 
43.7 uC cm * (Fig. 3a and Extended Data Fig. 4c). PUND and leakage 
current measurements confirm the intrinsic origin of the polarization, 
consistent with the 300K dc resistivity of 2.1 x 10'7Q cm (Extended 
Data Fig. 4d). The switching of the intrinsic perovskite weak ferromag- 
netic magnetization at the bulk coercive field thus coexists with the 
switching of the ferroelectric polarization at room temperature. 

The long-range ordered P and M in x = 0.15, y = 0.80 afford bulk 
magnetoelectric coupling at room temperature with a linear magneto- 
electric susceptibility («) of 0.26(1) ps m? (Fig. 3f). Variable temper- 
ature measurements (Fig. 3e) show that « is —0.91(1) ps m ‘at10K, 
with a change of sign similar to that found for x = 0.15, y = 0.60 upon 
heating. The linear magnetoelectric susceptibility tends to zero at the 
bulk Ty (Fig. 3e), demonstrating that it arises from interaction 
between the coexisting magnetic and electric long-range orders. The 
x= 0.15, y= 0.80 material with 68% Fe** B-site occupancy (and 
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thus a percolating network of Fe-O-Fe superexchange paths to give 
long-range magnetic order) is a room-temperature magnetoelectric 
ferromagnetic ferroelectric material. The introduction of high-temper- 
ature long-range magnetic order into MPB systems is a diversifiable 
strategy for the generation of tunable multiferroic materials. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Sample preparation. Powder samples of (1 — x)BiTiq — »y2FeyMgq — yy203-(x) 
CaTiOs, in the compositional range x = 0.15, 0.60 = y = 0.90, were synthesized by 
a conventional solid-state reaction. The binary oxides Bi,O3 (99.99% Alfa Aesar, 
pre-dried at 473K), CaCO; (99.997% Alfa Aesar, pre-dried at 473K), Fe,O3 
(99.998% Alfa Aesar, pre-dried at 473K), TiO. (99.995% Alfa Aesar, pre-dried 
at 473K) and MgCO3*Mg(OH)2‘xH,O (x ~ 3, 99.995% Alfa Aesar, used as 
received) were weighed in stoichiometric amounts and ball milled in ethanol for 
20h. The mixtures obtained after evaporating ethanol were pelletized and calcined 
at 1,208 K for 12h in a platinum-lined alumina crucible. These pellets were then 
re-ground thoroughly and re-pelletized, and subjected to a second calcination at 
1,213 K for 12h in platinum-lined alumina crucibles. The resulting powders were 
found to contain only the target phase with no minority phases visible by PXRD. 
Dense pellets (>95% of crystallographic density) suitable for property measure- 
ments were produced from these powders by the following protocol. First, 2 wt% 
polyvinyl butyral binder and 0.2 wt% MnO, were added to the samples, and this 
mixture was ball-milled for 20 h. Second, the resultant mixture was pelletized 
(8mm diameter) with a uniaxial press, followed by pressing at about 2 x 10° Pa 
in a cold isostatic press. Third, these pellets were loaded into a platinum-lined 
alumina boat. Finally, a programmable tube furnace was used to heat the reaction 
under flowing oxygen to 943 K for 1 h, followed by 1,228 K for 3 h and 1,173 K for 
12h before cooling to room temperature at 5 K min’. The resultant pellets were 
found to contain no minority phases by PXRD. Their densities were measured 
using an Archimedes balance. 

Powder X-ray diffraction (PXRD). All data were collected using a PANalytical 
X’Pert Pro diffractometer in Bragg-Brentano geometry with a monochromated 
Co Ka, source (wavelength 2 = 1.78896 A) and position-sensitive X’Celerator 
detector. Each sample was contained in a back-filled sample holder and rotated 
during the measurement. A programmable divergence slit was used to provide a 
constant illuminated area throughout the angular range. Data were collected in the 
angular range 5° = 20 = 130° in steps of 0.0167°. Pawley refinements were carried 
out using the software package Topas Academic (version 5). For each PXRD 
pattern, background was modelled using a Chebyschev polynomial function with 
12 refined parameters. Lattice parameters, a sample height correction, peak profile 
functions and model-independent peak intensities were refined. Peak profiles were 
modelled with a modified Thompson—Cox-Hastings pseudo-Voigt function. 
When fitting data to a single phase (R3c or P1la cells), a Stephens anisotropic 
strain broadening function was refined. In two-phase (R3c + Pna2,) refinements, 
this function was refined only for the rhombohedral (R3c) phase. 

Electrical measurements. For electric poling, gold was sputtered on both sides of 
thin disks (thickness of 130-160 tm with tolerance of 101m). For P(E) and 
PUND measurements, silver conductive paint (RS Components) was applied on 
both sides of thin disks and cured at 393 K for 10 min. The edges were bevelled by 
approximately 0.2 mm to avoid electrical breakdown. The area of the electrode was 
measured under an optical microscope equipped with a camera and measurement 
software. The disk was loaded in a Radiant high-voltage test fixture. Silicone oil 
was used as a dielectric medium to avoid air breakdown. P(E) measurements were 
conducted using a Radiant ferroelectric tester system and an aixACCT piezo- 
electric evaluation system (aixPES). PUND measurements were carried out using 
the Radiant ferroelectric tester system with a square electric field pulse with 
a delay of 500ms and pulse widths of 5ms (x=0.15, y=0.60) and 8ms 
(x = 0.15, y= 0.80). The remanent polarizations for positive (dP/2) and nega- 
tive (—dP/2) applied electric fields are calculated as dP/2 = (P* — P*)/2 and 
—dP/2 = (—P* — (—P’))/2, respectively, where P* contains both remanent and 
non-remanent polarization, whereas P” contains only the non-remanent polar- 
ization. P* and P; are equivalent polarizations of P* and P’*, respectively, measured 
when the applied electric field is reduced to zero following the pulse. 

Leakage current measurements. Leakage current was measured in an aixPES 
instrument using a triangular waveform in steps of 25 V and with a step duration 
of 2 s. A switching prepolarization pulse was applied before actual measurements. 
Resistivity measurements. Resistivity was measured using the two-probe method 
in a Magnetic Property Measurement System (MPMS) XL-7 SQUID magneto- 
meter (Quantum Design). The pellet was loaded into a modified dc-SQUID probe 
and connected to a Keithley 6430 sub-femptoamp remote sourcemeter. 
Impedance measurements. Impedance and phase angles were measured using an 
Agilent LCR meter E4980 by applying an ac voltage of 0.5 V in the frequency range 
20 Hz to 2 MHz. 

Magnetic measurements. Magnetic measurements were carried out using MPMS 
XL-7 and MPMS3 systems (Quantum Design). For this, powder or pellet samples 
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were loaded into a polycarbonate capsule and fixed into a straight plastic drinking 
straw and then loaded into a dc-SQUID probe. The Néel temperature (Ty) was 
determined from peak of dMrpm/dT. The isothermal dc magnetization data 
were decomposed using the general function M(H)= >> m;(H), where m; are 
generic functions describing single magnetic components taking the form 
m,(H) =a tanh (4) +d. Here a represents the saturation magnetization, b 
the coercive field, c is a parameter that describes the squareness of the loop, 
and d is a linear term that includes paramagnetic, diamagnetic and antiferro- 
magnetic contributions for the individual magnetic component. Above the mag- 
netic ordering temperature of the perovskite phase, only one component was 
used to describe the isothermal magnetization assigned to a high Fe content 
impurity. Below the perovskite magnetic ordering temperature, two components 
were used. 
MOssbauer spectroscopy. Absorption mode Méssbauer spectroscopy measure- 
ments were performed at room temperature, using an electromagnetic Doppler 
drive system, a °’Co(Rh) y-ray source with an actual activity of about 20 mCi anda 
Xe-gas Reuter Stokes proportional counter, and Canberra amplification, discrim- 
ination and scaling electronics. Samples were diluted with sucrose (icing sugar), for 
measurements at an approximate ratio of 0.2, to prevent excessive line-shape 
distortion and non-resonant absorption, owing to the high bismuth content of 
the samples. Custom modelling and nonlinear least-squares error minimization 
routines were used for the extraction of the spectroscopic parameters. Isomer 
shifts are reported with respect to the source. 
Magnetoelectric measurements. Details of the magnetoelectric measurements 
set-up** and protocol’* are described elsewhere. Note that the load resistance, 
mentioned and used in ref. 28 to gain a suitable voltage drop from the MPMS 
ac coil power supply, was omitted in our experiments; instead, the MPMS ac coil 
power supply was directly connected to the input of a Krohn-Hite 7600 M wide- 
band power amplifier. In this experiment, a sinusoidal electric field E = E,.cos(wt) 
(where w = 2nf, with f frequency, and E,, is the electric field amplitude) is applied 
across the disk and the first harmonic of the complex ac magnetic moment, 
m(t) =(m'—im") cos (qt) is measured. The measurements were performed in 
the absence of any dc magnetic or electric fields. In this scenario, the real part of 
the electrically induced magnetic moment” is m! = «Eg. a where V is the sample 
volume. This moment involves only the linear magnetoelectric («) effect; the 
higher-order effects are zero. The corresponding electrically induced volume ac 
magnetization is defined as M,. = m'/V. To demonstrate the linear magnetoelec- 
tric effect on x = 0.15, y = 0.60 and x = 0.15, y = 0.80, the electric field amplitude 
E,- was varied and the induced moment was recorded. Linear magnetoelectric 
susceptibility («) was calculated from a plot of volume ac magnetization amplitude 
ac 
a= lo AE. 
formed at f= 1 Hz with 20 blocks to average and 10 scans per measurement. 
The sensitivity of the experimental set-up used here is |m'|=VXM,.> 
5X10 %Am ~. Prior to magnetoelectric measurements, disks were poled 
externally using the aixPES instrument at a field of 100kV cm * for 15 min from 
343 K to room temperature. Disks were then loaded into a modified dc-SQUID 
probe at 300 K and subjected to a magnetic field of 2T for 30 min. After the 
removal of the electric and magnetic fields, electrodes were short circuited for 
15 min before conducting magnetoelectric measurements at 300 K. For magneto- 
electric measurements at 10 K and 150 K (for x = 0.15, y = 0.60), the sample was 
cooled down to the measurement temperature in the presence of an electric field 
(3.5kVcm ')anda magnetic field (2 T) and the protocol for 300 K measurement 
was followed. To determine the temperature dependence of «, an electric field 
(3.5kV cm! for y = 0.60 and 2.7kV cm ' for y = 0.80) and a magnetic field of 
2T were applied at 300 K, followed by cooling to 10 K at a rate of 1 K min’ '; the 
data were collected at 1 Hz. The temperature was stabilized for 5 min at each step 
before measurement. The room-temperature bulk dc resistivity of x= 0.15, 
y =0.60 is 3.3 X 10!*Qcm, and that of x=0.15, y= 0.80 is 2.1 X 10’*Qcm. 
The leakage currents observed for y= 0.60 and y= 0.80 are 0.35nA (320K) 
and 11.4nA (360K), respectively, at the maximum measurement fields. These 
values are too low to cause any artefacts in the magnetoelectric measurements. 
The upper limit of temperature in this measurement set-up is 360 K. 


Mac (=m'/V) versus Eac: 


(ref. 29). All measurements were per- 


28. Borisov, P., Hochstrat, A., Shvartsman, V. V. & Kleemann, W. Superconducting 
quantum interference device setup for magnetoelectric measurements. Rev. Sci. 
Instrum. 78, 106105 (2007). 

29. Schmid, H. Some symmetry aspects of ferroics and single phase multiferroics. 
J. Phys. Cond. Matter 20, 434201 (2008). 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


0 50 100 150 200 250 300 350 4S 2 0 1 2 3 4 
T (K) HoH (T) 

Extended Data Figure 1 | Magnetic properties of composition x= 0.15, remanent magnetization in zero applied field (TRM, blue line). Note negative 

y= 0.25. Left, magnetization versus temperature, cooled in zero applied field © TRM curve is due toa negative remanent magnetic field in the superconducting 

(ZEC, black line), cooled in 1 mT applied field (FC, red line) and the thermal _— magnet. Right, magnetization versus magnetic field at 100 K. 
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Extended Data Figure 2 | PXRD patterns obtained from six compositions of 
the series (1 — x)BiTig — y2FeyMgq — y/203-(x)CaTiO; where x= 0.15, 0. 
60 = y=0.90. The weak reflection marked with the + symbol, which is 
visible in the y = 0.70 and y = 0.75 patterns, corresponds to the most intense 
reflection of sillenite (Biz;FeO4o). All other peaks are indexed to the target 
perovskite phase using rhombohedral, rhombohedral + orthorhombic, or 
monoclinic cells, as discussed in the text. 
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Extended Data Figure 3 | Pawley fits to PXRD patterns collected from two 
compositions of the series (1 — x)BiTig — y2FeyMgy — y/203-(x)CaTiO3. 
a-f, x = 0.15, y = 0.60 (a-c) and x = 0.15, y = 0.80 (d-f) modelled as a single 
rhombohedral phase in space group R3c (a, d), as a combination of 
rhombohedral (R3c) and orthorhombic (Pna2,) phases (b, e) and as a single 
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monoclinic phase in space group P11a, which is a subgroup of R3c and Pna2, 
(c, f). Black circles, yop5; red line, ycaic3 teal line, (Yobs — Ycaic); blue markers, hkl 
(R3c) reflections; green markers, hkl (Pna2,) reflections; magenta markers, hkl 
(P11a) reflections. Insets are zooms of the main plots. 
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Extended Data Figure 4 | Dielectric, polarization and leakage 
characteristics. a, Frequency dependence of dielectric permittivity (left axis, 
dashed line) and loss (right axis, solid line) at 300 K for x = 0.15, y = 0.60 
(black) and x = 0.15, y = 0.80 (red). b, A typical P(E) loop (right axis, blue line) 
with the corresponding current density (Jpg; left axis, black line) and the 
leakage current density (J;; left axis, red line) for x = 0.15, y = 0.80. c, The 
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polarization (blue line, left axis) and electric field profile (red dotted line, 
right axis) from PUND measurement of x = 0.15, y = 0.80 (see Methods for 
details). d, Temperature dependence of dc resistivity of x = 0.15, y = 0.80, 
showing highly insulating behaviour. In a-c, the arrows point to the relevant 
axis for each curve. 
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Extended Data Figure 5 | Isothermal magnetization M(H). a, b, x = 0.15, 
y = 0.60 at T= 10 K< Ty (a) and x = 0.15, y = 0.80 at T = 300K < Ty (b). 
The experimental data are represented as black filled circles. Red lines show the 
sum of the perovskite phase (blue line) and spinel impurity phase (green dashed 
line) contributions. c, x = 0.15, y = 0.60 at T= 300 K > Ty and x = 0.15, 

y = 0.80 at T= 395 K> Ty. The experimental data are represented as open 
circles (x = 0.15, y = 0.60) or squares (x = 0.15, y = 0.80); green dash-dotted 
and dashed lines show extracted spinel impurity contributions for x = 0.15, 

y = 0.60 and x = 0.15, y = 0.80, respectively; red lines show fits to the data. 
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Extended Data Figure 6 | Thermal remanent magnetization data. 

a, b, Thermal remanent magnetization (TRM; left axis, black circles) and 
derivative of TRM with respect to temperature (dMypy/dT; right axis, blue 
lines) for x = 0.15, y = 0.60 (a) and x = 0.15, y = 0.80 (b). Arrows indicate 
the axis that each dataset corresponds to. 
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Extended Data Figure 7 | Linear magnetoelectric effect for x= 0.15, 
y= 0.60 at 10 K. Red squares are mean values, error bars in red are standard 
errors from 10 repeated measurements. The blue line is a linear fit to the data. 
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Extended Data Figure 8 | P(E) measurements above room temperature. a, b, Measurements for x = 0.15, y = 0.60 at frequency f= 100 Hz (a) and x = 0.15, 


y = 0.80 at frequency f= 150 Hz (b) at 473K. 
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Extended Data Table 1 


Refined lattice parameters and agreement factors from Pawley fits to PXRD data 


Refined Lattice Parameters (space group P11a) Weighted Profile R-factor (Rwp) Goodness of fit va) 
Composition 
° . 7 . 23 R3c + R3c+ 
a(A) b(A) c(A) v(°) Volume (A’) R3c Pna2y P1la R3c Pna2, P1la 
x=0.15, y= 
0.60 5.6037(3) 7.9047(6) 5.5666(1)  89.433(7)  246.56(2) 7.421 6.373 6.169 2.104 1.583 1.499 
x=0.15, y= 
0.80 5.6019(7) 7.903(1) 5.5641(1) 89.40(1) 246.32(5) 6.883 6.136 5.969 1.821 1.477 1.392 


Refined lattice parameters (a, b, c, y) and the corresponding unit cell volumes, obtained by fitting toa V2ap x 2ap x V2ap unit cell in space group P11a, and agreement factors (Rwp, and 7°) from Pawley fits to PXRD 
data, fitted in three different candidate space groups, for compositions x = 0.15, y= 0.60 and x = 0.15, y= 0.80. 
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Extended Data Table 2 | Spectroscopic parameters from Mossbauer data fitting of x = 0.15, y = 0.80 at 300 K 


Component Area (%) & (mm.s") Q(mm.s”) By (T) 
1 30.8(1) 0.29(5) 0.033(7) 46(1) 
2 67.8(1) 0.31(5) -0.003(2) 23(4) 
3 0.6(1) 0.30(5) -0.063(4) 33.5(5) 
4 0.7(1) 0.30(5) -0.32(1) 41.7(5) 


The area, isomer shift (6), electric quadrupole moment (Q) and hyperfine field (Bhs) for different components, extracted from a multicomponent fit, with standard errors in parentheses. 
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The contribution of outdoor air pollution sources to 
premature mortality on a global scale 


J. Lelieveld’?, J. S. Evans***, M. Fnais®, D. Giannadaki? & A. Pozzer! 


Assessment of the global burden of disease is based on epidemiolo- 
gical cohort studies that connect premature mortality to a wide 
range of causes’°, including the long-term health impacts of ozone 
and fine particulate matter with a diameter smaller than 2.5 micro- 
metres (PM, 5)*°. It has proved difficult to quantify premature mor- 
tality related to air pollution, notably in regions where air quality is 
not monitored, and also because the toxicity of particles from vari- 
ous sources may vary’®. Here we use a global atmospheric chemistry 
model to investigate the link between premature mortality and seven 
emission source categories in urban and rural environments. In 
accord with the global burden of disease for 2010 (ref. 5), we 
calculate that outdoor air pollution, mostly by PM,.;, leads to 3.3 
(95 per cent confidence interval 1.61-4.81) million premature deaths 
per year worldwide, predominantly in Asia. We primarily assume 
that all particles are equally toxic’, but also include a sensitivity study 
that accounts for differential toxicity. We find that emissions from 
residential energy use such as heating and cooking, prevalent in India 
and China, have the largest impact on premature mortality globally, 
being even more dominant if carbonaceous particles are assumed 
to be most toxic. Whereas in much of the USA and in a few other 
countries emissions from traffic and power generation are important, 
in eastern USA, Europe, Russia and East Asia agricultural emissions 
make the largest relative contribution to PM,,5, with the estimate of 
overall health impact depending on assumptions regarding particle 
toxicity. Model projections based on a business-as-usual emission 
scenario indicate that the contribution of outdoor air pollution to 
premature mortality could double by 2050. 

Air pollution is associated with many health impacts, including 
chronic obstructive pulmonary disease (COPD) linked to enhanced 
ozone (O3), and acute lower respiratory illness (ALRI), cerebrovascu- 
lar disease (CEV), ischaemic heart disease (IHD), COPD and lung 
cancer (LC) linked to PM: 5 (ref. 8). Many previous studies have 
been based on air quality measurements, largely focusing on urban 
pollution**''"*, Atmospheric chemistry and transport models have 
been used to account for other environments, including those for 
which no measurement data are available’>. 

Recently, enhanced resolution regional and global models and 
satellite data have been applied to improve estimates of PM, 5 and 
O3 concentrations and their impact on air quality’ **. Here we present 
results obtained with an atmospheric chemistry-general circulation 
model, applied at high resolution to compute global air quality 
changes, combined with population data, country-level health statist- 
ics and pollution exposure response functions (Methods). Our calcu- 
lations of air pollution related mortality are based on the method of the 
global burden of disease (GBD) for 2010 (ref. 5), applying improved 
exposure response functions that more realistically account for 
health effects at very high PM,.; concentrations compared to former 
assessments*. This is particularly relevant for some parts of the world 
where air pollution has increased nearly unabated and for future scen- 
arios that project the continued growth of emissions. Following the 


GBD* we also include desert dust (which is largely natural) with PM, 5; 
hence strictly speaking we assess the effects of atmospheric composi- 
tion. 

The air quality guidelines of the World Health Organization 
(WHO) and national regulatory policies are based on exposure res- 
ponse functions that rely on PM2.5 mass concentrations, implicitly 
treating all fine particles as equally toxic without regard to their source 
and chemical composition. However, expert elicitation suggests that 
carbonaceous particles are more toxic than crustal material, nitrates 
and sulfates’. A recent study” finds that PM, ; from coal combustion 
leads to increased mortality risk from cardiovascular disease and LC, 
but that the evidence is much weaker for other sources, whereas esti- 
mates using non-specific PM, ; mass alone may underestimate the total 
effect of PM>.5 on mortality. Further, this study did not find support for 
mortality from biomass combustion and soil dust particles”. However, 
this and a subsequent report by the Health Effects Institute in the USA 
also note that there were only a limited number of cities in these 
investigations where these sources and components were likely to be 
measured consistently**’’. While the evidence for differential toxicity is 
far from conclusive, we conducted a secondary analysis assuming that 
carbonaceous PM, 5 is five times more toxic than inorganic particles, 
though maintaining the same overall health impact of PM) 5. 

We have calculated premature mortality linked to CEV, COPD, IHD 
and LC for adults =30 years old, and ALRI for infants <5 years old 
(Table 1 and Extended Data Tables 1 and 2). Our estimate of the global 
PM, ; related mortality in 2010 is 3.15 million people with a 95% con- 
fidence interval (CI95) of 1.52-4.60 million. The main causes are CEV 
(1.31 million) and IHD (1.08 million), and secondary causes are COPD 
(374 thousand), ALRI (230thousand) and LC (161 thousand). Our 
global estimate of O3 related mortality by COPD is 142 (C195: 90- 
208) thousand. Our total estimate of 3.30 (CI95: 1.61-4.81) million 
people in 2010 agrees closely with the GBD®. This is in addition to 
the estimated 3.54 million deaths per year caused by indoor air pol- 
lution due to use of solid fuels for cooking and heating’. Figure 1 shows 
the geographic distribution and demonstrates the locations of hotspots 
in China, India and many of the large urban centres. 

Considering the global population of 6.8 billion in 2010, it follows 
that the mean per capita mortality attributable to air pollution is about 
5 per 10,000 person-years. Of these 5 persons per 10,000 worldwide, 
about 2 die by CEV, 1.6 by IHD, 0.8 by COPD, 0.35 by ALRI and 0.25 by 
LC. The highest per capita mortality is found in the Western Pacific 
region, followed by the Eastern Mediterranean and Southeast Asia. The 
combination of high per capita mortality with high population density 
explains the (by far) highest number of deaths in the Western Pacific, 
China being the main contributor (1.36 million per year). Note that the 
mortality attributable to air pollution in China is approximately an 
order of magnitude higher than that attributable to Chinese road trans- 
port injuries and HIV/AIDS, and ranks among the top causes of 
death*. Southeast Asia has the second highest premature mortality, 
where India is the main contributor (0.65 million per year). The global 
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Table 1 | Premature mortality related to PM2.5 and Os for the population <5 and =30 years old 


WHO region Year Population (x 10°) Mortality attributable to air pollution (deaths x 10%) 
PMoas O03 Total 
ALRI <5 yr IHD =30 yr CEV = 30 yr COPD = 30 yr LC =30 yr COPD = 30 yr 
Africa 2010 809 90 55 it 1 2 2 23) 
2050 807 158 185 262 38 5 12 660 
Americas 2010 930 0 44 8 4 7 5 68 
2050 il bl 0 75 15 i 11 11 119 
Eastern Mediterranean 2010 602 56 115 86 2 5 12 286 
2050 021 66 321 246 37 13 40 723 
Europe 2010 867 1 239 95 3 ei 6 381 
2050 886 1 307 156 8 37 11 530 
Southeast Asia 2010 762 64 327 250 124 15 82 862 
2050 2,332 104 865 807 419 48 227 2,470 
Western Pacific 2010 812 19 299 794 209 107 35 1,463 
2050 861 16 413 1,120 309 155 a7 2,070 
World 2010 6,783 230 1,079 1,311 374 161 142 3,297 
2050 9,098 346 2,166 2,604 828 270 358 6,572 


Regions are defined by the World Health Organization, see Extended Data Table 1. Results for 2050 are based on a business-as-usual scenario. 


mortality linked to air pollution is strongly influenced by these high 
numbers in Asia. 

We determined the impacts of seven source categories by sub- 
tracting them one by one from the emissions in our model. These 
sensitivity calculations show the efficacy of individually controlling 
these sources. The 15 countries with highest premature mortality 
attributable to air pollution in 2010 are listed in Table 2 along with 
the contribution of each source category. Residential and commercial 
energy use (RCO) is the largest source category worldwide, contrib- 
uting nearly one-third, and almost a factor of 2 more under the alterna- 
tive assumption of differential toxicity. Note that this only refers to 
mortality by outdoor exposure to this source. Our estimate of 1.0 mil- 
lion deaths per year by RCO is in addition to the 3.54 million deaths 
per year due to indoor air pollution from essentially the same source’. 

The next largest anthropogenic source category is agriculture 
(AGR), contributing one-fifth; however, this reduces significantly 
under the assumption of differential particle toxicity. The successive 
principal anthropogenic categories are power generation (PG), indus- 
try (IND), biomass burning (BB) and land traffic (TRA), and taken 
together they cause nearly one-third of all air pollution mortality. If 
carbonaceous particles are five times more toxic than sulfates and 
nitrates, these sources together account for one-quarter of the mortal- 


ity. Natural sources make up for the remaining one-sixth of the total. 
However, if crustal material is five times less toxic than carbonaceous 
PM, ; this reduces considerably. The most important source category 
in each region in 2010 is shown in Fig. 2. 

RCO is foremost in the populous parts of Asia. It refers to small 
combustion sources, especially biofuel use (for heating and cooking), 
and also waste disposal and diesel generators. In China it contributes 
about 32%, in India, Bangladesh, Indonesia and Vietnam 50-60%, 
while in Nepal it is highest with nearly 70% (Extended Data Table 3). 
In western countries it is typically 5-10%, although in France and 
Poland it contributes about 15%. The contribution of this pollution 
source to mortality is sensitive to toxicity assumptions and large uncer- 
tainty related to IHD. Because of the comparatively large fraction of 
carbonaceous PM, 5, under our alternative calculations where these 
aerosols are five times more toxic, RCO increases from 31% to 59% 
of global air pollution mortality. If, on the other hand, we assume that 
RCO does not contribute to IHD mortality, this fraction decreases from 
31% to 26% (Methods). 

Agriculture (AGR) has a remarkably large impact on PM) 5, and is 
the leading source category in Europe, Russia, Turkey, Korea, Japan and 
the Eastern USA (Fig. 2). In many European countries, its contribution 
is 40% or higher. Agricultural releases of ammonia (NH3) from 
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Figure 1 | Mortality linked to outdoor air pollution in 2010. Units of mortality, deaths per area of 100 km X 100 km (colour coded). In the white areas, annual 
mean PM, ; and O; are below the concentration-response thresholds where no excess mortality is expected. 
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Table 2 | Top 15 ranked countries of premature mortality linked to outdoor air pollution in 2010 


Country Deaths (10°) Residential energy Agriculture Natural Power generation Industry Biomass burning Land traffic 
China 1,357 32 (76) 29 (7) 9 (3) 18 (7) 8 (3) 1(2) 3 (2) 
India 645 50 (77) 6 (1) Lig) 14 (5) 7 (3) 7 (9) 5 (4) 
Pakistan 111 31 (67) 2 (1) 57 (23) 2 (1) 2 (2) 2 (3) 3 (3) 
Bangladesh 92 55 (78) 10 (2) 0 (0) 15 (6) 7 (2) 7 (8) 6 (4) 
Nigeria 89 14 (31) 1 (0) 77 (52) 0 (0) 0 (0) 8 (16) 0 (0) 
Russia 67 7 (18) 43 (26) 1 (0) 22 (17) 8 (5) 8 (21) 11 (13) 
USA 55 6 (12) 29 (17) 2 (2) 3119) 6 (5) 59) 21 (36) 
Indonesia 52 60 (64) 2 (0) 0 (0) 5 (3) 4 (2) 27 (29) 2 (2) 
Ukraine 51 6 (13) 52 (32) 0 (0) 18 (17) 9(7) 5 (18) 10 (13) 
Vietnam 44 51 (74) 12) 0 (0) 13 (4) 8 (3) 12 (14) 4 (3) 
Egypt 35 1(2) 3 (3) 92 (88) 2 (2) 1(1) 0(1) (3) 
Germany 34 8 (17) 45 (26) 0 (0) 13 (10) 13 (8) 1 <3) 20 (36) 
Turkey 32 9 (20) 29 (19) 15 (6) 19 (14) 11 (8) 6 (19) (14) 
lran 26 1(3) 6 (6) 81 (75) 4 (4) 3 (3) 1(2) 4 (7) 
Japan 25 12 (29) 38 (22) 0 (0) 17 (15) 18 (14) 5 (8) 10 (12) 
World 3,297 31 (59) 20 (7) 18 (11) 14 (7) 78) 5 (8) 5 (5) 


Columns 3-9 show contributions (%) of the seven main source categories, the leading one in bold. For details and additional countries, see Extended Data Table 3. In parentheses are shown sensitivity calculations 


with carbonaceous particles having a five times larger impact than inorganic aerosol compounds. 


fertilizer use and domesticated animals affect air quality through several 
multiphase chemical pathways, forming ammonium sulphate and 
nitrate. Since NH; abundance is often limiting in PM, formation, 
reduction of its emissions can make an important contribution to air 
quality control’. As agricultural emissions mostly form inorganic 
PM, ;, the impact on mortality diminishes under the assumption that 
carbonaceous PM; is five times more toxic. 

Natural sources (NAT) contribute strongly to mortality, being dom- 
inant in northern Africa and the Middle East, and also a leading category 
in Central Asia (Table 2 and Fig. 2). Although we categorize airborne 
desert dust as natural, a fraction is anthropogenic due to the role of 
humans in desertification and agricultural practices*®. The chronic health 
and mortality impacts associated with exposure to dust are more uncer- 
tain than those due to typical air pollution in industrialized countries 
where most of the epidemiological cohort studies have been carried out. 
If all fine particles are equally toxic, then natural sources are responsible 
for about one-sixth of air pollution mortality. If fine carbonaceous part- 
icles are five times more toxic than crustal material, then natural sources 
account for only about one-tenth of air pollution induced mortality. 

Power generation (PG) by fossil fuel fired power plants is the third 
largest anthropogenic source category, being an important source of 


SO, and NO,, which are converted to sulfate and nitrate in the atmo- 
sphere. It accounts for about one-seventh of population exposure to 
PM, ; and O3. Power plant emissions are quite important in the USA 
(>30%) and in Russia, Korea and Turkey (roughly 20%). Emissions 
from power generation also have particularly large impacts on fine 
particle concentrations in the Middle East, but frequently these go 
unnoticed as they are masked by desert dust. The role of this source 
is sensitive to the assumed PM, ; toxicity, reducing by a factor of 2 if 
sulfate and nitrate are five times less toxic than carbonaceous PM) >. 

Industry (IND) is among the smaller source categories, with a global 
fraction of about 7% (Table 2); nevertheless, it contributes about twice 
this percentage in most of the western world. It includes iron and steel, 
chemical, pulp and paper, food, solvent and other manufacturing sec- 
tors, oil refineries and fuel production. This source of air pollution is 
generally significant in industrialized countries and emerging eco- 
nomies, but rarely the leading cause of premature mortality. Under 
the differential toxicity assumption, its contribution to mortality 
would reduce by more than a factor of 2. 

Our calculations suggest that land traffic (TRA) emissions are 
responsible for about one-fifth of mortality by ambient PM, 5 and 
O; in Germany, the UK and the USA, while globally they account 
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Figure 2 | Source categories responsible for the largest impact on mortality 
linked to outdoor air pollution in 2010. Source categories (colour coded): 
IND, industry; TRA, land traffic; RCO, residential and commercial energy use 


(for example, heating, cooking); BB, biomass burning; PG, power generation; 
AGR, agriculture; and NAT, natural. In the white areas, annual mean PM; 5 is 
below the concentration-response threshold. 
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for about 5%. Because emissions of NO, are the dominant source of 
traffic-related PM; 5 in the form of nitrate, together with carbonaceous 
PM, ;, the results from our alternative calculations—assuming carbona- 
ceous particles are five times more toxic than nitrates and other inor- 
ganics—also indicate a 5% contribution, globally. Note that this 
contribution is likely to be a lower limit as traffic also emits other pollu- 
tants that are not included or influential on PM; (ref. 31) (Methods). 

Biomass burning (BB) is also a relatively small source category with 
a global contribution of about 5%. Nevertheless, its areal range is large, 
for example in South America and Africa. It is the main source of air 
pollution in large parts of Canada, Siberia, Africa, South America and 
Australia. Because in many parts of these countries annual mean PM;.5 
is below the concentration-response threshold (Methods), these areas 
are shown white in Fig. 2. Biomass burning is also widespread in 
southeastern Asia, although in populous parts of Vietnam and 
Indonesia (for example, Java) residential energy use is larger and there- 
fore the leading category (Table 2). 

In the Southern Hemisphere biomass burning is generally the lead- 
ing contributor to PM, ;, with some exceptions. In Brazil it contributes 
about 70%, and in many African countries its impact can also be high, 
up to >90% in Angola. Note that the health impacts of PM; from 
biomass burning are quite uncertain, especially the attribution of IHD 
related mortality, due to a dearth of epidemiological cohort studies in 
regions where this pollution source predominates (Methods). Our 
calculations suggest that it is responsible for between 5% (equal tox- 
icity) and 8% (differential toxicity) of air pollution induced mortality. 

To understand how the premature mortality attributable to air 
pollution may develop in the coming decades, we applied a busi- 
ness-as-usual (BaU) emission scenario for the years 2025 and 2050, 
assuming that only currently agreed legislation is implemented that 
will affect future emissions”. Thus air quality and emission standards 
are fixed. Results for 2050 are presented here, and for 2025 in Extended 
Data Fig. 2 and Extended Data Tables 4, 5. Under the BaU scenario, 
moderate though significant increases of premature mortality will 
occur in Europe and the Americas, to a large degree in urban areas. 
Large increases are projected in Southeast Asia and the Western 
Pacific, leading to a global growth of premature mortality to 6.6 
(C195: 3.4-9.3) million (+100%) in 2050 (Table 1). This compares 
to a negligible population increase of infants (<5 years old), and a 
substantial increase (+68%) among people =30 years old in 2050 
(implying an ageing population). Globally, the per capita mortality 
is projected to increase from 5 per 10,000 person-year in 2010 to about 
7 per 10,000 person-year in 2050. The mortality attributable to air 
pollution will continue to be dominated by Asia with an unchanged 
fraction of about 75%. 

The urban population is expected to grow relatively rapidly from 
3.6 billion in 2010 to 5.2 billion in 2050, and combined with increasing 
air pollution concentrations the health impacts will escalate. Our 
estimate of urban premature mortality by outdoor air pollution in 
2010 is 2.0 million, increasing to 4.3 million in 2050, representing 
60% of the global total in 2010 and 65% in 2050. Urban population 
growth is responsible for part of this change, but the levels of air 
pollution in urban areas are also projected to grow rapidly. This is 
evident from our finding that the per capita mortality attributable to 
air pollution in 2010 is about 50% higher in urban than in rural envir- 
onments. Under the BaU scenario this difference is expected to 
increase to nearly 90% in 2050. 

Recently, much emphasis has been placed on rapidly emerging 
megacities (Methods). We calculate that 17 megacities and conurba- 
tions in Asia rank among the top 30 in terms of premature mortality 
worldwide, the leading one being the Pearl River Delta. When viewed 
instead from the perspective of individual risk, Tianjin and Beijing 
rank highest (Extended Data Table 6). While the per capita mortality 
attributable to air pollution is already extraordinary in Chinese mega- 
cities, according to the BaU scenario it will become even higher in 
Chinese and also Indian megacities by 2050. The combined premature 
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mortality in the 30 largest conurbations accounts for about 7% of 
the worldwide burden of air pollution, indicating the relevance of all 
urban areas. 

Our results suggest that if the projected increase in mortality attrib- 
utable to air pollution is to be avoided, intensive air quality control 
measures will be needed, particularly in South and East Asia. The 
poorly characterized uncertainty about the relative toxicity of various 
classes of particles such as sulfates, nitrates, organics, crustal materials, 
black carbon, and especially smoke from biomass combustion, limits 
unambiguous attribution of sources. Nevertheless, our study suggests 
that emissions from residential energy use should be considered in air 
pollution control strategies and, if all fine particles are equally toxic, the 
reduction of agricultural emissions would improve air quality. An 
improvement in the efficacy of air pollution controls requires a better 
understanding of the relative toxicity of particles from various emis- 
sions sources. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Model and emissions. We used the global ECHAM5/MESSy atmospheric chem- 
istry (EMAC) general circulation model at a spatial resolution of T106L31, that is, 
with a spherical spectral truncation of T106, which corresponds to a quadratic 
Gaussian grid of approximately 1.1° X 1.1° latitude X longitude (~110 km at the 
Equator), with 31 vertical hybrid terrain-following and pressure levels up to 10 hPa 
in the lower stratosphere. The core atmospheric model is the 5th generation 
European Centre Hamburg (ECHAMS, version 5.3.01) general circulation model”. 
EMAC includes sub-models that represent tropospheric and stratospheric pro- 
cesses and their interaction with oceans, land and human influences****. It uses 
the Modular Earth Submodel System (MESSy, v.1.09) to link submodels that 
describe emissions, atmospheric chemistry, aerosol and deposition processes; the 
results have been tested against in situ and remote sensing observations” ”’. 

Following up on Lelieveld et al.”, who focused on the year 2005, we present 
results for the years 2010, 2025 and 2050, applying monthly varying emission data 
from Doering et al.*°, also used by Pozzer et al.**. The data are from the Emission 
Database for Global Atmospheric Research (EDGAR), prepared by the Joint 
Research Centre of the European Commission in Ispra (Italy) at a resolution of 
0.1° latitude and longitude***’. For the year 2010 we performed sensitivity calcula- 
tions in which seven main emission categories have been removed one by one to 
compute the impact of these sources and to estimate their contributions to air 
quality control and related mortality. We first calculated the apportionment of 
source categories to the total PM; and O; concentrations and then applied the 
computed fractions to the total mortalities attributable to air pollution. 

The categories are: (1) ‘Natural’ (NAT), mostly desert dust but locally also sea 
salt and dimethyl sulphide derived sulphate, some nitrate and ammonium from 
natural sources, volcanic sulphur emissions and organics released by the vegeta- 
tion; (2) ‘Industry’ (IND), including iron and steel, chemical, pulp and paper, 
food, solvent and other manufacturing sectors, oil refineries and fuel production; 
(3) ‘Land transport’ (TRA), that is, road and non-road transport on land; 
(4) ‘Residential and commercial energy use’ (RCO), referring to local and com- 
mercial energy use from small combustion sources for space heating and cooking, 
including diesel generators and biofuel use; (5) ‘Power generation’ (PG), that is, 
public energy production by fossil fuel fired power plants; 6) ‘Biomass burning’ 
(BB), that is, tropical forest fires and deforestation, savanna and shrub fires, middle 
and high latitude forest and grassland fires, and agricultural waste burning; and 
(7) ‘Agriculture’ (AGR), dominated by ammonia emissions associated with the use 
of fertilizers and domesticated animals. Not included in these categories are air 
traffic and shipping. We find that the removal of individual source categories leads 
to a near-linear response in the modelled contributions to mortality, indicated by 
the small scaling corrections needed (about 10%) to add up to 100% in the country 
level contributions, that is, in Table 2 and Extended Data Table 3. 

The BaU scenarios for 2025 and 2050 assume that energy and food consump- 
tion are largely determined by population growth and economic development, 
which in turn drive air pollution sources based on current legislation and 
technology***’*'. This represents a pessimistic, but plausible future prospect. 
Comparable to Shindell et al’, and different from the Representative 
Concentration Pathways of the Intergovernmental Panel on Climate Change”, 
the BaU scenario differentiates between air pollution and climate change mitiga- 
tion measures, as the latter typically require relatively long-term and structural 
societal changes. The scenarios used here are based on projections for energy and 
fuel computed by the Prospective Outlook for the Long-term Energy System 
(POLES) model*'™ and for agriculture, land-use and waste projections by the 
Integrated Model to Assess the Global Environment (IMAGE)”. 

The population development in the BaU scenario is consistent with our mor- 
tality calculations, as described below, projecting 9 billion people in 2050. For 
additional details we refer to Pozzer et al.” and references therein. While BaU 
projections should not be conceived as ‘predictions’, especially for 2050, they 
represent the current trajectory into the future and may be considered a worst- 
case scenario, to explore what can be expected if air quality policies and health care 
remain as they are today. Note that these results are not sensitive to differential 
toxicity assumptions as the total mortality induced by PM); is not affected, only 
the attribution to source categories. For the future scenarios we used the baseline 
mortalities for 2010. Hence the implicit assumption is that smoking habits, diets 
and health care remain unchanged. 

The model meteorology has been forced by pre-calculated sea surface tempera- 
tures and ice coverage based on a 10-year climatology (2000-2009) adopted from 
the AMIP-II database**”’. The model was applied in atmospheric chemistry-trans- 
port mode by switching the coupling between radiation and atmospheric chem- 
istry off, so that atmospheric composition changes do not influence the model 
dynamics”. This is justified considering that air quality projections are primarily 
driven by emissions rather than climate change**’, even though natural 
sources, biomass burning and deposition processes can be influenced by climatic 


conditions”, For example, Fang et al.” project a 4% climate change effect for 
PM, 5 related mortality and less than 1% for O; related mortality by the end of the 
21st century. 

Although our model resolution does not resolve small-scale heterogeneities in the 
urban environment, a comparison with satellite and ground-based remote sensing 
observations indicates that this is not critical. The exposure response functions used 
to calculate mortalities are based on annual mean concentrations for which these 
heterogeneities largely average out. This is illustrated by Extended Data Fig. 3, which 
compares a simulation for the year 2010 with ground-based AERONET remote 
sensing data of aerosol optical depth (AOD) (http://aeronet.gsfc.nasa.gov). Since 
our model approximates though not replicates meteorological conditions for the 
year 2010, and local flows near the AERONET stations cannot be captured, substan- 
tial scatter around the ideal 1:1 comparison is expected. The comparison shows that 
the model mean error and bias are small (the latter absent for the annual mean), 
and the correlation good. We have also performed a comparison between MODIS 
(satellite) and AERONET data of AOD, leading to similar spread and correlations, 
the latter also increasing through averaging (not shown). 

The primary differences in the relationships between emissions and exposures 
for ground level sources, such as traffic, in comparison with elevated sources, such 
as power plants, have been accounted for in our model"’. The relative impacts of 
secondary particles (such as sulfates and nitrates) from these sources are expected 
to be realistically simulated. On the other hand, models such as ours cannot 
capture the fine structure of near-source gradients in ultrafine PM along trans- 
portation corridors. Because of this our estimates of the relative impacts of urban 
traffic and urban sources of primary fine particles may be biased downward, 
though only to the extent that ultrafine PM is in fact responsible for the mortality 
seen in cohort studies. As discussed above, the relative toxicity of various consti- 
tuents of ambient PM,.; has not been well established. Our sense is that the 
sensitivity study, allowing for carbonaceous particles to be five times as toxic as 
sulfates, nitrates and crustal material, is adequate to cover any potential differences 
in the relationships between emissions, exposure and differential toxicity of traffic 
related PM) 5. 

To investigate if our model reproduces urban concentration increments of PM 5 
and Os, that is, comparing the urban background with the rural environment, we 
compare our results with recent case studies®*’. For Paris and London our model 
computes urban PM, ; increments of 18% and 2%, respectively, consistent with the 
measurements and highly resolved model calculations. Our model calculations sug- 
gest that the leading sources of PM, 5 in Paris are residential energy use, agriculture 
and traffic. Agricultural emissions (NH;/NH,") are transported from the rural 
environment and contribute to PM25 in the city. For London we calculate that 
PM, ; is most strongly influenced by agriculture, traffic and power generation. The 
limited contribution by land traffic and the importance of atmospheric transport for 
air quality in London have been corroborated by observational analysis®. For Beijing 
we calculate an urban PM); increment of 5%, consistent with the conclusion by 
Zhang et al.’ that regional sources are crucial contributors to PM, 5. They estimate 
the contribution by traffic and waste incineration at 4%; our results suggest that traffic 
alone contributes 3% in this city and residential energy use 47%, which we find to be 
representative of China (Table 2). 

Our model calculations indicate that these relatively small urban increments for 
PM, ; are typical for many, though not all, cities. For example, for Johannesburg 
(including Pretoria) we find +41% and for the Pearl River area +62%, and in both 
conurbations residential energy use is the leading source of PM) 5. For O3 we find 
generally small and negative urban increments due to titration of O3 by local traffic 
emissions (in Paris —7% and in London —5%). Negative urban increments due to 
NO by traffic of a few per cent (comparing weekend with weekdays) have also been 
documented for American cities**. For Chicago, New York, Los Angeles and Atlanta 
we find negative O3 increments of 1-5% due to traffic and power generation. 
Sample size. No statistical methods were used to predetermine sample size. 
Exposure response functions. The premature mortality attributable to PM 5 and 
O; has been calculated by applying the EMAC model for the present (2010) and 
projected future (2025, 2050) concentrations. We combined the results with epi- 
demiological exposure response functions by employing the following relationship 
to estimate the excess (that is, premature) mortality: 


AMort=y,[(RR—1)/RR|Pop (1) 


AMort is a function of the baseline mortality rate due to a particular disease 
category y, for countries and/or regions estimated by the World Health 
Organization (the regions and strata are listed in the Extended Data Table 1). 
The term (RR — 1)/RR is the attributable fraction and RR is the relative risk. The 
disease specific baseline mortality rates have been obtained from the WHO Health 
Statistics and Health Information System. The value of RR is calculated for the 
different disease categories attributed to PM,.; and O; for the population below 
5 years of age (ALRI) and 30 years and older (IHD, CEV, COPD, LC) using 
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exposure response functions from the 2010 GBD analysis of the WHO (and 
described below). 

The population (Pop) data for regions, countries and urban areas have been 
obtained from the NASA Socioeconomic Data and Applications Center (SEDAC), 
hosted by the Columbia University Center for International Earth Science 
Information Network (CIESIN), available at a resolution of 2.5’ X 2.5’ (about 
5km X 5 km) (http://sedac.ciesin.columbia.edu/), and projections by the United 
Nations Department of Economic and Social Affairs/Population Division” 
(http://esa.un.org/unpd/wpp). Urban areas are defined by applying a population 
density threshold of 400 individuals per km’, while for megacities and major 
conurbations the threshold is 2,000 individuals per km. We note that the reso- 
lution of our atmospheric model, about 1° latitude/longitude, is coarser than that 
of the population data, and our model does not resolve details of the urban 
environment. However, our anthropogenic emission data are aggregated from a 
resolution of 10 km to that of the model grid, accounting for relevant details such 
as altitude dependence (for example, stack emissions and hot plume rise effects)’. 

Lelieveld et al.*? (henceforth L2013) derived the relative risk RR from the fol- 
lowing exposure response function: 


RR= exp[b(X—X,)] (2) 


The term X represents the model calculated annual mean concentration of PM; or 
O3. The value of X, is the threshold concentration below which no additional risk is 
assumed (concentration-response threshold). The parameter b is the concentration 
response coefficient. However, it has been argued that this expression is based on 
epidemiological cohort studies in the USA and Europe where annual mean PM2 5 
concentrations are typically below 30 jigm °, which may not be representative for 
countries where air pollution levels can be much higher, for example in South and 
East Asia. This is particularly relevant for our BaU scenario. Therefore, here we have 
used the revised exposure response function of Burnett et al.* who also included 
epidemiological data from the exposure to second-hand smoke, indoor air pollution 
and active smoking to account for high PM,5 concentrations, and tested eight 
different expressions. The best fit to the data was found for the following relationship, 
which was also used by Lim et al.* for the GBD for the year 2010: 


RR=1+a{1— exp[—b(X—X,)’] } (3) 


The RR functions were derived by Burnett et al.*. We applied this model for the 
different categories, represented by their figures 1 and 2, shown to be superior to other 
forms previously used in burden assessments. We also adopted the upper and lower 
bounds, likewise shown in these figures, representing the 95% confidence intervals 
(C195). The latter were derived based on Monte Carlo simulations, leading to 
1,000 sets of coefficients and exposure response functions from which the upper 
and lower bounds were calculated. 

Following Burnett et al.* and Lim et al. we combine all aerosol types, hence 
including natural particulates such as desert dust. Note that by using PM, 5 mass, 
we do not distinguish the possibly different toxicity of various kinds of particles. This 
information is not available from epidemiological cohort studies, but could poten- 
tially substantially affect both our overall estimates of mortality and the geographical 
patterns. This is addressed by sensitivity calculations presented in the main text, 
Table 2 and Extended Data Fig. 1. For COPD related to O; we applied the exposure 
response function by Ostro et al.’: 


RR=[(X+1)/(X,+1)]? (4) 


where b is 0.1521 and X, the average of the range 33.3-41.9 p.p.b.v. O3 indicated by 
Lim et al.°, that is, 37.6 p.p.b.v. Previously we used model calculated pre-industrial O3 
concentrations to estimate X — X, (ref. 21), leading to about 20% higher estimates for 
mortality by ‘respiratory disease’ related solely to O03; compared to the current estim- 
ate for COPD due to both PM); and O3. 

For detailed discussion of uncertainties and sensitivity calculations that address 
the shape of exposure response functions, we refer to earlier work**'”? and 
references therein. L2013 estimated statistical uncertainties by propagating the 
quantified (random) errors of all parameters in the exposure response functions. 
They found that the C195 of estimated mortality attributable to air pollution in 
Europe, North and South America, South and East Asia are within 40%, whereas 
they are 100-170% in Africa and the Middle East. Our results are very close to the 
GBD, which substantiates the estimates by Lim et al.* and provides consistency 
with the most recent estimates for 2010, serving as a basis for our investigations. 

We emphasize that the confidence intervals described here, and those reported 
by Lim et al.’, reflect only the statistical uncertainty of the parameters used in the 
concentration-response functions. It is known that the uncertainty in interpreta- 
tion of epidemiological results can be dominated by other model or epistemic 
uncertainties, such as those having to do with the control of confounders. Sources 
of uncertainty have been summarized by Kinney et al.’’, who underscore the need 
to determine the differential toxicity of specific component species within the 


LETTER 


complex mixture of particulate matter. Our sensitivity calculations (Table 2 and 
Extended Data Fig. 1) corroborate that this can have significant influence, espe- 
cially in areas where carbonaceous compounds contribute strongly to PM). 

We emphasize the dearth of studies that link PM2 5 from biomass combustion 
emissions—rich in carbonaceous particles—to IHD. Expert judgment studies on 
the toxicity of particulate matter have reported uncertainties much larger than 
those suggested by analysis of parameter uncertainty alone’*”. Although the C195 
intervals provided above include a larger range of parameters and uncertainties 
than these earlier studies, they should be viewed as lower bounds on the true 
uncertainty in estimates of the health effects of PM,; exposure, especially PM) 5 
from biomass burning and biofuel use. If we consider the possibility that biomass 
burning (BB, including agricultural waste burning) and residential energy use 
(RCO, dominated by biofuel use) do not contribute to mortality by IHD, the total 
mortality attributable to air pollution would decrease from 3.3 to 3.0 million per 
year (Extended Data Table 7). The largest effect is found in Southeast Asia where 
biomass combustion (RCO and BB) is a main source of air pollution. While the 
global contribution by residential energy use, as presented in Table 2, would 
decrease from 31% to 26%, and of biomass burning from 5% to 4% (the other 
categories increase proportionally), the ranking of the different sources and hence 
our conclusions remain unchanged, as RCO and BB would still be the largest and 
smallest source category, respectively. 

Issues such as the shape of the concentration-response functions and the exist- 

ence and specific levels of concentration—-response thresholds have been discussed 
by the experts'®’!”*. These have been accounted for by Burnett et al.*, however, 
uncertainty related to the differences in central estimates given by various 
cohort studies is not reflected in the estimates of parameter uncertainty by Lim 
et al.°. This problem has grown more substantial recently as the results from 
new cohort studies have become available”. Furthermore, uncertainty about 
the relative toxicity of different constituents of PM25 remains. Since the current 
study underscores that the sources of mortality attributable to PM, can differ 
strongly between different regions (Fig. 2), this aspect merits greater attention 
in future. 
Comparison to previous work. We estimate the combined (PM, ; and O; related) 
global mortality attributable to air pollution in 2010 at 3.3 million. Our global 
estimate for PM, 5 related mortality of 3.15 million per year is close to that of 
3.22 million per year in the GBD study for 2010 (ref. 4). However, it is substantially 
higher than the recent multi-model study of Silva et al.” for the year 2000, being 
2.1 million per year. The difference can be explained by the focus of Silva et al.”° on 
anthropogenic pollution in 2000, whereas our study and the GBD account for 
emission increases between 2000 and 2010 and also include natural sources. 

Our global estimate of O3 related mortality by COPD in 2010 is 142,000, 
substantially lower than the estimates of Anenberg et al.'*, 700,000 deaths in 
2000; L2013, 773,000 in 2005; and Silva et al.”°, 470,000 deaths in 2000; but quite 
close to the GBD estimate of 152,000 deaths in 2010. Much of the difference 
between our results (and those from the 2010 GBD) and previous work is 
explained by the fact that we attribute COPD to both O3 and PM, 5. When our 
results for COPD from both O3 and PM, ; are combined, our overall estimate of 
COPD mortality from air pollution agrees with the above-mentioned studies 
within about 25-30%. The remaining differences are largely due to the use of a 
concentration response threshold, X,, in our new work, which substantially 
reduces mortality estimates. Anenberg et al.'* and L2013 did not apply a threshold 
but computed the natural background based on preindustrial emissions. In these 
analyses the calculated ambient concentrations are typically lower than X,. For 
example, the global average O; ambient concentration at the surface in our pre- 
industrial simulation is 19 p.p.b.v. The global mortality estimate for 2010 pre- 
sented here is 10% higher than that of L2013 for 2005. This is primarily due to 
the fact that we also account for natural sources in the present work. If we subtract 
the natural fraction, our estimate of mortality attributable to anthropogenic air 
pollution for 2010 is 9% lower than that of L2013, mostly related to the new 
exposure response functions applied here. 

Our calculations suggest that natural sources contribute relatively strongly to 
mortality attributable to air pollution (18%), about 600,000 per year, which is to a 
large degree caused by airborne desert dust. Recently we reported a global dust- 
related mortality rate of about 400,000 per year, substantially lower than the pre- 
sent estimate”. While here we follow the GBD methodology’, it is likely to yield an 
upper limit. Instead of the annual mean dust concentrations Giannadaki et al.”* 
used the median concentrations, motivated by the intermittent nature of dust 
events. Their sensitivity calculations indicate that had they used the mean con- 
centration instead, their estimate of global dust-related mortality would have 
increased from 402,000 per year to 622,000 per year. Finally, if we assume that 
carbonaceous aerosols are five times more toxic than other compounds, including 
dust particles, the contribution by natural sources would decrease from about 
600,000 per year (18%) to 360,000 per year (11%). 
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Extended Data Figure 1 | Source categories responsible for the largest impact than inorganic and crustal compounds. IND, industry; TRA, land 
traffic; RCO, residential energy use (for example, heating, cooking); BB, 


impact on mortality linked to outdoor air pollution in 2010 from a 


sensitivity calculation with carbonaceous aerosol having a five times larger _ biomass burning; PG, power generation; AGR, agriculture; and NAT, natural. 
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Extended Data Figure 2 | Increase in mortality linked to outdoor air pollution from 2010 to 2050 (business-as-usual scenario). Units (colour coded), deaths 
per area of 100km X 100km. In the white areas, no additional mortality is projected. 
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Extended Data Figure 3 | Comparison of EMAC model calculated aerosol 
optical depth (AOD) with AERONET observations, using all available 
measurements worldwide in the year 2010. Although the comparison with 
individual data points shows a large scatter (left panel), the bias is 

small (MBE), and time averaging improves the agreement. The middle panel 
shows a comparison of the monthly means, and the right panel the annual 


means (that is, showing individual stations) for which the mean error (root 
mean square error, RMSE) is smallest, the correlation highest and the bias 
absent. The long-dashed line indicates absolute agreement, the bold short- 
dashed lines agreement within a factor of two and the short-dashed lines 
agreement within a factor of ten. 
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Extended Data Table 1 | WHO regions, mortality strata, child and adult mortality characteristics, and the countries and territories included 


Region Stratum Child Adult Countries and territories within stratum 
mortality mortality 
Afr-D High High Algeria, Angola, Benin, Burkina Faso, Cameroon, Cape Verde, Chad, 
Comoros, Equatorial Guinea, Gabon, Gambia, Ghana, Guinea, Guinea- 
Africa Bissau, Liberia, Madagascar, Mali, Mauritania, Mauritius, Mayotte, 


Niger, Nigeria, Reunion, Saint Helena, Sao Tome and Principe, Senegal, 
Seychelles, Sierra Leone, Togo 


Afr-E High Very high Botswana, Burundi, Central African Republic, Congo, Céte d’Ivoire, 
Democratic Republic of the Congo, Eritrea, Ethiopia, Kenya, Lesotho, 
Malawi, Mozambique, Namibia, Rwanda, South Africa, Swaziland, 
Uganda, United Republic of Tanzania, Zambia, Zimbabwe 


Amr-A Very low Very low Canada, Cuba, Greenland, Saint Pierre and Miquelon, United States of 
America 
Americas Amr-B Low Low Anguilla, Antigua and Barbuda, Argentina, Aruba, Bahamas, Barbados, 


Belize, Bermuda, Brazil, British Virgin Islands, Cayman Islands, Chile, 
Colombia, Costa Rica, Dominica, Dominican Republic, El Salvador, 
Falkland Islands, French Guiana, Grenada, Guadeloupe, Guyana, 
Honduras, Jamaica, Martinique, Mexico, Montserrat, Netherlands 
Antilles, Panama, Paraguay, Puerto Rico, Saint Kitts and Nevis, Saint 
Lucia, Saint Vincent and the Grenadines, Suriname, Trinidad and 
Tobago, Turks and Caicos Islands, United States Virgin Islands, 
Uruguay, Bolivarian Republic of Venezuela 


Amr-D High High Bolivia, Ecuador, Guatemala, Haiti, Nicaragua, Peru 
Southeast Asia Sear-B Low Low Indonesia, Sri Lanka, Thailand 
Sear-D High High Bangladesh, Bhutan, Democratic People’s Republic of Korea, East 


Timor, India, Maldives, Myanmar, Nepal 


Eur-A Very low Very low Andorra, Austria, Belgium, Croatia, Cyprus, Czech Republic, Denmark, 
Faeroe Islands, Finland, France, Germany, Gibraltar, Greece, Guernsey, 
Iceland, Ireland, Isle of Man, Israel, Italy, Jersey, Liechtenstein, 
Luxembourg, Malta, Monaco, Netherlands, Norway, Portugal, San 
Marino, Slovenia, Spain, Svalbard, Sweden, Switzerland, United 
Kingdom 


Europe 


Eur-B Low Low Albania, Armenia, Azerbaijan, Bosnia and Herzegovina, Bulgaria, 
Georgia, Kyrgyzstan, Poland, Romania, Serbia and Montenegro, 
Slovakia, Tajikistan, The former Yugoslav Republic of Macedonia, 
Turkey, Turkmenistan, Uzbekistan 


Eur-C Low High Belarus, Estonia, Hungary, Kazakhstan, Latvia, Lithuania, Republic of 
Moldova, Russia, Ukraine 


Eastern Emr-B Low Low Bahrain, Iran, Jordan, Kuwait, Lebanon, Libyan Arab Jamahiriya, Oman, 
Mediterranean Qatar, Saudi Arabia, Syrian Arab Republic, Tunisia, United Arab 
Emirates 


Emr-D High High Afghanistan, Djibouti, Egypt, Iraq, Morocco, Palestinian Territories, 
Pakistan, Somalia, Sudan, Yemen 


Wpr-A Very low Very low Australia, Brunei Darussalam, Japan, New Zealand, Singapore 


Western Pacific Wpr-B Low Low Cambodia, China, Cook Islands, Fiji, French Polynesia, Guam, Hong 
Kong, Kiribati, Lao People’s Democratic Republic, Macao, Malaysia, 
Marshall Islands, Pitcairn, Fed. States of Micronesia, Mongolia, Nauru, 
New Caledonia, Niue, Norfolk Island, Northern Mariana Islands, Palau, 
Papua New Guinea, Philippines, Republic of Korea, Samoa, Solomon 
Islands, Taiwan, Tokelau, Tonga, Tuvalu, Vanuatu, Vietnam, Wallis and 
Futuna 
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Extended Data Table 2 | Premature mortality related to PM2.5 and O3 in 2010 
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Mortality attributable to air pollution 


(deaths x103) 


PM25 03 Total 
Strata Population ALRI IHD CEV COPD LC COPD 
(«10°) <Syr 230yr 230yr 230yr 230yr 230yr 
Africa Afr-D 379 77 37 62 9 1 2 188 
Afr-E 430 13 17 15 3 1 49 
Amr-A 352 0 38 6 4 6 3 57 
Americas Amr-B 493 0 5 1 0 1 1 8 
Amr-D 85 0 0 0 0 0 0 1 
Eastern Emr-B 165 2 34 20 2 1 3 62 
ee een 437 54 80 67 11 3 9 224 
Eur-A 410 0 73 33 7 16 3 132 
Europe Eur-B 229 1 57 34 3 7 2 104 
Eur-C 228 0 110 28 2 4 0 144 
Southeast Asia Sear-B 324 3 37 23 7 3 2 75 
Sear-D 1,438 61 290 227 117 12 80 787 
Western Pacific Wpr-A 156 0 11 9 0 5 2 27 
Wpr-B 1,656 19 288 784 209 102 33 1,435 
World 6,783 230 1,079 1,311 374 161 142 3,297 
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Extended Data Table 3 | Premature mortality by PM2.5 and O3 related diseases in 2010 in countries where it exceeds 9,000 individuals 
per year (<5 and =30 years old) 


Country Deaths in Natural Industry Land Residential Power Biomass Agriculture 
2010 traffic energy generation burning 
China 1,357,353 118,954 106,754 44,751 435,763 237,324 18,414 395,390 
India 644,993 74,145 42,336 30,070 325,604 89,130 42,163 41,541 
Pakistan 110,571 63,147 2,478 3,389 34,707 2,761 2,108 1,977 
Bangladesh 91,923 0 6,117 5,656 50,382 13,697 6,418 9,652 
Nigeria 89,022 68,479 176 85 12,006 258 7,554 462 
Russia 67,152 630 5,193 7,731 4,885 14,606 5,477 28,628 
USA 54,905 1,290 3,297 11,435 3,192 16,929 2,537 16,221 
Indonesia 52,417 71 1,814 1,244 31,498 2,379 14,338 1,070 
Ukraine 51,238 55 4,632 5,188 3,011 9,459 2,326 26,563 
Vietnam 44,097 0 3,627 1,686 22,575 5,486 5,378 5,343 
Egypt 35,322 32,651 210 450 190 816 61 941 
Germany 34,422 0 4,452 6,928 2,684 4,402 279 15,675 
Turkey 31,943 4,912 3,414 3,487 2,812 6,194 1,851 9,269 
Iran 26,108 21,175 662 969 311 1,101 230 1,656 
Japan 25,516 0 4,567 2,526 3,046 4,458 1,154 9,763 
Sudan 24,255 22,249 59 47 200 133 1,488 77 
Myanmar 22,537 10 1,082 842 8,287 2,662 8,707 944 
Italy 20,809 1,251 2,930 3,519 1,454 3,192 376 8,085 
Iraq 20,335 18,513 209 390 109 510 91 510 
Thailand 19,843 0 2,211 1,469 5,207 2,944 6,529 1,481 
France 17,800 0 2,515 3,152 2,468 2,113 211 7,339 
Dem. Rep. Korea (N) 16,783 0 1,996 770 3,445 3,467 715 6,386 
United Kingdom 15,488 0 1,627 3,091 854 2,412 63 7,438 
Algeria 14,954 11,262 1,113 656 194 773 107 847 
Dem. Republic Congo 14,880 901 193 45 1,405 121 12,119 92 
Romania 14,633 0 1,336 1,825 1,403 3,225 573 6,270 
Saudi Arabia 14,600 13,708 165 165 46 308 38 167 
Poland 14,561 0 1,451 1,886 2273 2,265 372 6,310 
Korea (S) 14,352 0 2,045 1,108 2,049 2,803 321 6,024 
Morocco (+W. Sahara) 14,217 10,929 966 346 224 856 99 795 
Niger 13,061 12,893 9 0 89 16 32 19 
Uzbekistan 11,598 7,341 671 590 578 623 265 1,526 
Nepal 10,926 56 641 510 7,481 1,090 681 465 
Mali 9,444 9,060 1 0 101 0 273 6 
Ghana 9,317 5,552 105 68 525 68 23922 75 
Burkina Faso 9,295 8,851 3 0 127 20 256 35 
World 3,297,370 596,895 226,137 163,852 1,002,370 464,748 179,268 664,100 
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Extended Data Table 4 | Premature mortality related to PM2z.5 and O3 in 2025 


Mortality attributable to air pollution 
(deaths x103) 


PM25 03 Total 
Strata Population ALRI IHD CEV COPD LC COPD 
(«10°) <Syr 230yr 230yr 230yr 230yr 230yr 
Africa Afr-D 538 102 60 99 14 2 5 282 
Afr-E 597 17 30 27 4 1 2 81 
Amr-A 395 0 48 8 5 7 6 74 
Americas Amr-B 561 0 9 3 1 1 3 17 
Amr-D 105 0 1 0 0 0 0 1 
Eastern Emr-B 197 2 55 33 3 2 5 100 
eee “nie 579 61 133 109 17 6 19 345 
Eur-A 423 0 79 37 8 17 5 146 
Europe Eur-B 246 1 74 46 4 9 4 138 
Eur-C 221 0 123 34 3 5 0 165 
Southeast Asia Sear-B 362 3 54 38 11 5 7 118 
Sear-D 1,697 76 461 398 201 21 155 1,312 
Western Pacific Wpr-A 158 0 11 11 1 5 4 32 
Wpr-B 1,760 18 377 1,038 284 138 51 1,906 
World 7,838 280 1,515 1,881 556 219 266 4,717 
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Extended Data Table 5 


Premature mortality related to PM2.5 and O3 in 2050 


Mortality attributable to air pollution 


(deaths x103) 


PM25 03 Total 
Strata Population ALRI IHD CEV COPD LC COPD 
(x10°) <Syr 230yr 230yr 230yr 230yr 230yr 
Africa Afr-D 874 137 121 203 28 4 8 501 
Afr-E 933 21 64 58 10 1 4 158 
Amr-A 451 0 59 10 6 10 7 92 
Americas Amr-B 609 0 15 5 1 1 4 26 
Amr-D 131 0 1 0 0 0 0 1 
Eastern Emr-B 222 2 80 50 5 3 7 147 
eeceees ee 799 64 241 196 32 10 33 576 
Eur-A 431 85 43 9 20 5 162 
Europe Eur-B 252 1 95 68 6 12 5 187 
Eur-C 203 0 127 45 3 5 1 181 
Southeast Asia Sear-B 382 3 69 51 14 7 8 152 
Sear-D 1,950 101 796 755 405 41 219 2,317 
Western Pacific Wpr-A 149 0 10 9 0 4 4 27 
Wpr-B 1,712 16 403 1,110 309 151 53 2,042 
World 9,098 346 2,166 2,604 828 270 358 6,572 
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Extended Data Table 6 | Population and premature mortality (deaths per year) related to PM2.5 and O3 in the most polluted megacities and 
conurbations in 2010, 2025 and 2050 


2010 2025 2050 
Megacity Population Deaths Population Deaths Population Deaths 
(106) (x103) (106) (103) (x10°) (103) 
London 8.1 2.8 9.1 3.4 10.2 4.2 
Paris 8.4 3.1 9.2 3.8 10.2 4.6 
Moscow 14.9 8.6 14.4 10.8 13.1 11.7 
Po Valley 3.4 1.3 3.4 1.4 3.2 1.4 
Istanbul 11.1 5.6 13.2 8.5 14.5 13.2 
Teheran 9.7 2.9 11.1 4.8 11.4 6.9 
Cairo 12.5 6.0 15.9 8.2 19.8 11.4 
Lagos 8.3 3.7 12.7 6.0 22.0 11.2 
Johannesburg! 6.9 1.5 7.7 2.3 8.6 3.8 
Karachi 11.9 7.3 15.4 11.4 19.4 17.9 
Mumbai? 18.0 10.2 22.1 17.4 26.8 33.1 
Delhi 22.5 19.7 27.8 31.1 33.3 52.0 
Kolkata 20.3 13.5 28.4 26.6 38.8 54.8 
Dhaka 22.8 13.1 31.2 26.4 38.2 49.9 
Jakarta 22.5 10.4 26.1 16.4 29.0 22.1 
Chengdu 6.2 7.4 6.4 9.5 5.9 9.7 
Beijing 10.8 13.7 11.3 17.3 10.4 17.7 
Tianjin 3.7 4.9 3.9 6.2 3.6 6.3 
Shanghai 14.1 14.9 14.3 18.9 13.2 19.4 
Seoul 20.8 6.6 21.7 8.5 20.3 8.7 
Tokyo 29.2 6.0 28.1 6.4 24.2 5.4 
Osaka 13.5 2.8 12.8 3.1 10.9 2.6 
Hong Kong 6.9 2.6 7.6 3.7 8.8 4.4 
Pearl River area 53.1 49.2 56.0 65.2 52.9 67.4 
Manila 19.8 0.6 26.5 2.3 37.3 4.5 
Bangkok 8.8 3.1 9.5 4.9 9.2 5.7 
New York 12.5 3.2 14.5 4.2 17.5 5.2 
Los Angeles 12.2 4.1 14.6 5.2 17.7 7.0 
Mexico City 10.7 1.6 12.3 3.3 13.9 5.3 


1 Includes Pretoria 


2 Includes east suburb 


The names of the megacities cities have been colour coded according to the WHO regions: Europe, black font; Eastern Mediterranean, blue; Africa, red; Southeast Asia, green; Western Pacific, brown; Americas, 


purple. 
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Extended Data Table 7 | Premature mortality related to PM2 5 and O3 for the population aged <5 years and =30 years 


Mortality attributable to air pollution 
(deaths x103) 


WHO region Year aa PMs 0; Total 
ALRI IHD CEV COPD LC COPD 
<5 yr 230yr 230yr 230yr 230yr 230yr 
Africa 2010 809 90 55 77 11 2 2 237 
2010* 37 219 
Americas 2010 930 0 44 8 4 7 5 68 
2010* 35 59 
Eastern 2010 602 56 115 86 12 5 12 286 
Mediterranean 2010* 104 275 
Europe 2010 867 1 239 95 13 27 6 381 
2010* 213 355 
Southeast Asia 2010 1,762 64 327 250 124 15 82 862 
2010* 169 704 
Western Pacific 2010 1,812 19 299 794 209 107 35 1,463 
2010* 250 1,414 
World 2010 6,783 230 1,079 1,311 374 161 142 3,297 
2010* 808 3,026 


*|n these rows, IHD mortality related to residential energy use (RCO) and biomass burning has been excluded. 
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Non-adaptive plasticity potentiates rapid adaptive 
evolution of gene expression in nature 


Cameron K. Ghalambor'”, Kim L. Hoke’?, Emily W. Ruell', Eva K. Fischer', David N. Reznick* & Kimberly A. Hughes* 


Phenotypic plasticity is the capacity for an individual genotype 
to produce different phenotypes in response to environmental 
variation’. Most traits are plastic, but the degree to which plasticity 
is adaptive or non-adaptive depends on whether environmentally 
induced phenotypes are closer or further away from the local 
optimum’ ~*. Existing theories make conflicting predictions about 
whether plasticity constrains or facilitates adaptive evolution*”’. 
Debate persists because few empirical studies have tested the rela- 
tionship between initial plasticity and subsequent adaptive evolu- 
tion in natural populations. Here we show that the direction of 
plasticity in gene expression is generally opposite to the direction 
of adaptive evolution. We experimentally transplanted Trinidadian 
guppies (Poecilia reticulata) adapted to living with cichlid predators 
to cichlid-free streams, and tested for evolutionary divergence in 
brain gene expression patterns after three to four generations. We 
find 135 transcripts that evolved parallel changes in expression 
within the replicated introduction populations. These changes are 
in the same direction exhibited in a native cichlid-free population, 
suggesting rapid adaptive evolution. We find 89% of these tran- 
scripts exhibited non-adaptive plastic changes in expression when 
the source population was reared in the absence of predators, as they 
are in the opposite direction to the evolved changes. By contrast, the 
remaining transcripts exhibiting adaptive plasticity show reduced 
population divergence. Furthermore, the most plastic transcripts in 
the source population evolved reduced plasticity in the introduction 
populations, suggesting strong selection against non-adaptive 
plasticity. These results support models predicting that adaptive 
plasticity constrains evolution’ *, whereas non-adaptive plasticity 
potentiates evolution by increasing the strength of directional 
selection’’’”. The role of non-adaptive plasticity in evolution has 
received relatively little attention; however, our results suggest that 
it may be an important mechanism that predicts evolutionary 
responses to new environments. 

A long-standing problem in evolutionary biology is to understand 
the relationship between environmentally induced variation observed 
within a generation, and genetically-based evolutionary changes 
between generations’ ®. It has long been recognized that the expression 
of traits is plastic—the same genotype can produce a range of pheno- 
types in response to different environmental cues. However, the causal 
relationship between a trait’s plasticity and that trait’s evolution 
remains an unresolved and contentious problem’. Traditional models 
of adaptive evolution ignored any role for plasticity, because environ- 
mentally induced plasticity was viewed as non-heritable variation’”. 
Current models recognize that environments can cause predictable 
patterns of plasticity that are either adaptive or non-adaptive with 
respect to the local phenotypic optimum; such plasticity may influence 
evolutionary change by altering the distribution of phenotypes upon 
which selection acts. For example, plasticity is adaptive when the 
phenotype is altered in the same direction favoured by natural selec- 
tion in that environment*’*. Some models predict that adaptive plas- 
ticity weakens the strength of directional selection and slows adaptive 


evolution® *”*, Other models suggest that adaptive plasticity is a critical 
first step in the process of adaptive evolution (for example, via genetic 
assimilation or accommodation)’, for instance by increasing popu- 
lation persistence in new environments (the Baldwin effect) and allow- 
ing more time for selection to act on heritable variation* ’°. In contrast, 
plasticity is non-adaptive when a population encounters an envir- 
onment that induces the production of phenotypes further away from 
the local optimum*”, resulting in a negative relationship between the 
direction of plasticity and the direction of adaptive evolution. Non- 
adaptive plasticity reduces relative fitness and is predicted to increase 
the strength of directional selection because traits are further from the 
phenotypic optimum, resulting in an evolutionary response some- 
times referred to as ‘genetic compensation’ or “counter-gradient vari- 
ation’. Laboratory selection experiments have found support for a 
positive (adaptive)'*** and negative (non-adaptive)’® relationship 
between the direction of plastic responses and the direction of evolu- 
tion. However, testing such relationships in natural populations has 
been challenging because comparisons between ancestral and derived 
populations typically occur long after the populations have diverged’”°. 
Here, we test the relationship between plasticity and the early stages of 
evolutionary divergence using experiments in nature. We assess both 
ancestral plasticity in the source population and evolved changes in 
replicated derived populations by comparing plastic and evolved pat- 
terns of gene expression. 

We quantified gene expression in Trinidadian guppies derived 
from natural populations and from populations undergoing early 
divergence following an experimental translocation. Individuals 
from a population that experiences high mortality from fish pre- 
dators (high-predation, denoted as HP), particularly the pike cichlid 
(Crenicichla frenata), were introduced into each of two low-predation 
sites lacking cichlids: ‘Introl’ and ‘Intro2’ (Extended Data Fig. 1). 
Thirty-eight gravid females and 38 mature males were introduced into 
each stream. One year after the introduction (3-4 guppy generations), 
guppies were collected from the ancestral HP source population, des- 
cendant introduction populations (Introl and Intro2), and a naturally 
colonized low-predation guppy population (denoted as LP) from the 
same drainage (Methods). The natural LP population represents an 
older evolutionary descendant of the HP source population” adapted 
to the same predation regime as the experimental populations. It thus 
provides an a priori prediction for the expected direction of evolution- 
ary change. 

To assess plastic and evolved changes in transcription, we bred wild- 
caught fish under common laboratory conditions for two generations 
and generated unique family lines within each of the four populations. 
Two generations of rearing in a common environment controls for 
environmental, maternal and other non-heritable sources of variation. 
Within 24h of birth, second generation full-siblings of each family 
were randomly split between tanks that differed in exposure to chem- 
ical predator cues. Siblings reared with predator cues were raised in 
recirculating units that housed a cichlid within the water supply”. 
Cichlids were fed two guppies per day. Predator cues included both 
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predator kairomones from the cichlid as well as any alarm pheromones 
from guppies, simulating the ancestral olfactory environment™. 
Guppies reared without predator cues were housed in identical recir- 
culating units without the cichlid predator, simulating the derived 
environment. Differences in transcription between siblings reared 
in these two environments represent predator-induced plasticity in 
gene expression, while differences between populations measured 
under the same conditions for multiple generations represent heritable 
differences”’. 

To determine whether the introduction populations showed evid- 
ence for adaptive evolutionary divergence, we measured patterns of 
transcription in all four populations under the derived rearing envir- 
onment. We measured the abundance of 37,493 messenger RNA tran- 
scripts expressed in whole brains of mature males reared without 
predator cues (mean age = 124.03 days old, range = 118-154), using 
high-throughput RNA sequencing. We used multivariate between- 
group principal components analysis (Methods) to visualize overall 
transcription differences among the four populations (Fig. 1). Two 
major axes explained 74.5% of the variation. Principal component 1 
(PC1; 44.4% of variation) separated the naturally occurring LP popu- 
lation from the natural HP and introduction populations, and thus 
appears to reflect long-term divergence between these populations. 
PC2 (30.1% of variation) separated the HP source population from 
the two introduction populations and the natural LP population, thus 
capturing a signal of rapid and parallel evolutionary divergence to the 
LP environment (Fig. 1). Whereas genetic drift, founder effects, and 
unique attributes of each of the introduction streams would be 
expected to produce independent genetic changes in the introduction 
populations”, the parallel change of Introl and Intro2 in the same 
direction as the natural LP population supports the interpretation that 
PC2 describes rapid adaptive evolution. Indeed, the rate of evolution- 
ary divergence in gene expression between the source population and 


PC2 


PC1 


Figure 1 | Rapid evolutionary divergence in gene expression as measured in 
second-generation laboratory-born guppies derived from the wild. Shown is 
a principal components analysis of all 37,493 expressed genes in the four 
populations. HP is a naturally occurring high-predation population that is the 
source population for the two experimentally introduced populations, Intro1 
and Intro2. LP is a naturally occurring low-predation population. Points 
represent individual families within each population, and are connected by 
solid lines. Dashed lines represent the major and minor axes of the confidence 
ellipse for each population. 
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introduction populations for the top 500 transcripts loading on PC2 
(median Haldanes (a change in phenotypic standard deviations per 
generation) in Introl = 0.256 and Intro2 = 0.226) are comparable to 
rapid rates of evolution observed in life history and morphology dur- 
ing previous experimental introductions of guppies”*”* (Methods and 
Extended Data Fig. 2a, b). 

To distinguish transcripts that exhibited evolution in the introduc- 
tion populations as a result of selection from those that exhibited 
changes as a result of other processes, we identified transcripts that 
exhibited highly significant parallel evolutionary change in both intro- 
duction populations and that diverged in the same direction in the 
natural LP population. Permuted data sets (n = 250) were generated 
by randomly reassigning population labels to individual samples. We 
then used general linear statistical models to assess divergence in the 
two introduction populations and the natural LP population (that is, 
HP versus Introl and HP versus Intro2 and HP versus LP) for each 
transcript (Methods). If the test statistic for each of the three contrasts 
fell in the extreme 5% of the distribution of the permutation test 
statistics, and the contrasts all had the same sign, we called the tran- 
script concordantly differentially expressed (CDE). We found 135 
transcripts that met these stringent criteria, which was many more 
than observed in the permuted data sets (median = 6, interquartile 
range = 3-14; Methods and Supplementary Table 1). By contrast, only 
one transcript diverged significantly in opposite directions in the two 
descendant introduction populations, consistent with expectations based 
on the distribution of permuted values (median = 1, interquartile 
range = 1-2). These 135 CDE transcripts loaded highly on PC2 (the 
median rank of the PC2 loadings for the CDE transcripts was 361 out 
of 37,493 total transcripts). The prevalence of these parallel changes 
suggests that this subset of transcripts evolved through the direct or 
indirect effects of natural selection, because genetic drift would have 
produced discordant as well as concordant evolution in the descend- 
ant introduction populations. Indeed, divergence in transcription 
between the ancestral and introduction populations greatly exceeded 
allele frequency divergence in putatively neutral microsatellite loci** 
(Extended Data Table 1). Collectively, these results demonstrate rapid 
and repeatable patterns of adaptive evolutionary divergence in tran- 
scription, similar to what has been observed for other fitness-related 
guppy traits following the colonization of low-predation environ- 
ments?22°8, 

Given the evidence for rapid evolution of transcription, we deter- 
mined if the pattern of ancestral plasticity in the HP source population 
predicted adaptive evolution in the descendant introduction popula- 
tions. We assessed plasticity in the HP population by measuring the 
change in transcript abundance of full siblings reared with and without 
the predator cue (that is, simulating the ancestral high-predation and 
derived low-predation environments). If plasticity in transcript 
abundance was in the same direction as the parallel divergence 
observed in CDE transcripts, we considered plasticity to be adaptive. 
If the plastic changes were in the opposite direction as the evolved 
changes in CDE transcripts, we considered the plasticity to be non- 
adaptive (see Extended Data Fig. 3). We found a robust pattern of 
non-adaptive plasticity predicting evolutionary change in CDE tran- 
scripts; when HP fish were reared without the predator cue, the change 
in transcript abundance was overwhelmingly in the opposite direction 
to that of evolved changes in the descendant introduction populations 
(Fig. 2). The negative association between the direction of plasticity 
and the direction of evolution was highly significant (y* = 89.9, 
d.f.= 1), which is outside the range of all 250 permuted ¢ values 
(range = 0.0-55.9), with 89% (120 of 135) of all transcripts exhibiting 
a plastic response opposite to the direction of evolution (see grey points 
in Fig. 2). Of the remaining 11% (15 of 135) of transcripts, when the 
direction of plasticity and evolution aligned, the degree of plasticity 
was negligible (see black points in Fig. 2). The correlation between 
ancestral plasticity and evolution (r = —0.82) is substantially more 
negative than correlations generated from a randomization test 
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Figure 2 | Rapid evolutionary divergence is highly correlated with non- 
adaptive plasticity. Shown is a scatter plot of ancestral plasticity (change in 
transcript abundance to the absence of cichlid predator cues) against adaptive 
evolutionary divergence (135 concordantly differentially expressed transcripts) 
in the descendent populations transplanted to streams lacking cichlid 
predators. Grey points denote transcripts exhibiting non-adaptive plasticity, 
and black points denote adaptive plasticity. Inset shows the distribution of the 
Spearman rank correlations between evolutionary divergence and ancestral 
plasticity from 1,000 permutated correlation values for the 135 concordantly 
differentially expressed transcripts, with the arrow indicating the observed 
correlation, which is substantially more negative than all permuted values. 


P<0.0001; Fig. 2). These results suggest that plasticity potentiates 
rapid adaptive evolution, but not because plasticity is adaptive, as is 
assumed in many evolutionary models, but rather because it is non- 
adaptive and under stronger selection to change (Fig. 2). The same 
pattern is observed when we restrict the analysis to a separate data set 
that included 565 transcripts exhibiting significant plasticity to the 
rearing treatments in the HP source population (Supplementary 
Table 2 and Extended Data Fig. 4) 

The magnitude of plasticity can also evolve in response to selec- 
tion’"°. Ifnatural selection acts most strongly on transcripts exhibiting 
non-adaptive plasticity, we predicted plasticity should evolve to be 
reduced in the descendant introduction populations. We tested this 
prediction by comparing plasticity in the ancestral source population 
to that in the derived introduction populations for the subset of 
transcripts that were CDE. The magnitude of plasticity decreased 
in the introduction populations (median change = —11%, sign test 
M = —45.5, P<0.001). Moreover, the decline in plasticity in these 
descendant populations was negatively associated with the magnitude 
of ancestral plasticity (P < 0.001 based on a randomization test; Fig. 3), 
in accord with the idea that selection acts more strongly to decrease 
plasticity in those transcripts showing the greatest non-adaptive plas- 
ticity. Thus, traits exhibiting initially non-adaptive plastic responses to 
new environments may be a transient phenomenon, because selection 
may act to rapidly reduce their magnitude. 

Attempts to model the effects of plasticity on subsequent adaptive 
evolution often assume that plasticity is adaptive. However, when 
populations experience novel environments, as when we experiment- 
ally transplanted guppies, many of the initial plastic responses are 
likely to be non-adaptive, because selection has not had an opportunity 
to act on the genetic variation for plasticity” *. In such cases, both 
adaptive and non-adaptive plastic responses would be expected by 
chance, but traits exhibiting adaptive plasticity should be under weaker 
directional selection relative to traits exhibiting non-adaptive plasticity 
and further from the new phenotypic optimum’’”. Indeed, both 
theoretical and empirical studies show that adaptive plasticity reduces 
directional selection”’*’’. While we were unable to directly estimate 
the strength of selection on transcript abundance phenotypes, 
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Figure 3 | Rapid evolution of reduced plasticity. Shown isa scatter plot of the 
absolute values for the magnitude of ancestral plasticity (the normalized 
difference in transcript abundance between the presence and absence of cichlid 
cues) against the change in plasticity between the source and introduction 
populations. Inset shows the distribution of the Spearman rank correlations 
between the magnitude of plasticity in the ancestral population and the change 
in plasticity in the introduction populations from 1,000 permutated correlation 
values for the 135 concordantly differentially expressed transcripts, with the 
arrow indicating the observed correlation, which is substantially more negative 
than all permuted values. 


previous introduction experiments have demonstrated strong dir- 
ectional selection and rapid adaptation in response to low-predation 
environments”*”*. If traits exhibiting non-adaptive plasticity are under 
stronger directional selection, then newly established populations will 
probably face a dual challenge if they are to persist and avoid extinc- 
tion. First, they must overcome the fitness costs associated with strong 
directional selection on non-adaptive responses, including declines in 
population size; and second, they must harbour enough genetic vari- 
ation to rapidly respond to selection’’®’*. Because heritable genetic 
variation for transcription appears to be common”, the potential for 
rapid adaptation may ameliorate one set of costs. However, other costs 
may be more difficult to avoid, as models suggest that population size, 
the distance a non-adaptive trait is from the local optimum, and the 
relationship of that trait to fitness will ultimately determine whether 
populations persist’®”*. In the case of the introductions here, such costs 
may have been reduced, because individuals were transplanted to 
relatively more ‘benign’ conditions, such that high predator-induced 
mortality was replaced with increased competition, reduced food 
availability, and other environmental factors characterizing the low- 
predation streams”. 

Understanding the role of phenotypic plasticity in adaptive evolu- 
tion remains a contentious problem in evolutionary biology, in part 
because few studies have been able to capture the initial patterns of 
plasticity and subsequent adaptive divergence of traits in natural popu- 
lations. Nevertheless, it is during the early stages of adaptive divergence 
that selection in new environments is likely to be strongest”"®”*, and 
when plasticity will either reduce or exacerbate the initial mismatch 
between the mean and optimal phenotypic responses*°. Recent work 
in these same guppy populations documents a similar pattern in which 
non-adaptive plasticity potentiates a rapid evolution of growth rate”, 
suggesting a general pattern that extends to other phenotypic traits. 
While such results are consistent with many models of how selection 
acts on phenotypes®”’, the role of non-adaptive plasticity in adaptive 
evolution remains understudied, despite arguments that it may be a 
common, but cryptic, form of evolution’’’*. More generally, under- 
standing when and how plasticity affects evolutionary response is 
critical for predicting the short- and long-term effects of envir- 
onmental change on organisms. Predictive evolutionary models of 
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phenotypic plasticity also have practical importance. For example, 
disease states within organisms respond plastically to treatments 
and also evolve, thus gene expression profiles can be used (as was 
done here) to predict how response to treatment influences disease 
progression”. Additional experimental evolution studies, especially 
those conducted in natural environments, will be critical for validating 
and parameterizing future models of how plasticity influences evolu- 
tionary change. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The investigators 
were not blinded to allocation during experiments and outcome assessment. 
Study system and populations. Guppies are a model system in evolutionary 
biology because they provide an opportunity to study rapid adaptive evolution 
in the wild’*”*. In lowland rivers, guppies occur in diverse fish communities where 
they experience high mortality from a number of fish predators. In small upstream 
tributaries, guppies occur in simpler communities, typically co-existing only with 
the killifish (Rivulus hartii), which poses little risk to adult guppies resulting in a 
low predator-induced mortality rate*”®. Past research has shown that numerous 
life history, behavioural, and morphological traits vary between these contrasting 
environments, and that these differences can evolve rapidly following experi- 
mental introductions*”*. We sampled four populations of guppies within the 
Guanapo River drainage in the Northern Range Mountains of Trinidad, West 
Indies (Extended Data Fig. 1). The first population, hereafter referred to as HP, 
is a naturally occurring population subject to high predation in the lower Guanapo 
river drainage that contains a variety of predator species, including the common 
predator on guppies, the pike cichlid*’°. The second population, hereafter 
referred to as LP, represents a native low-predation population from the same 
drainage and was sampled from the upstream Taylor tributary of the Guanapo 
river, where guppies co-exist with only R. hartii. R. hartii are gape-limited omni- 
vores that prey primarily on juvenile guppies*”°. The remaining two populations 
were experimentally established in two low-predation tributaries (the Lower 
Lalaja, and the Upper Lalaja) within the Guanapo drainage. 

Introduction experiments. In March 2008, HP guppies were introduced into the 
Lower Lalaja (denoted as Introl) and Upper Lalaja tributaries (denoted as Intro2) 
of the Guanapo drainage”. The two introduction populations were established in 
100-m reaches of these small, first-order tributaries. The upper limit of the intro- 
duction reach on the Lower Lalaja was bounded by a waterfall, which was arti- 
ficially enhanced to prevent emigration and the establishment of populations 
above the streams receiving introductions. The upper limit of the Upper Lalaja 
introduction reach had a natural barrier. The lower limit of both introduction 
sites had natural barriers, which blocked immigration from downstream popula- 
tions of guppies. The streams below these downstream barriers were also 
guppy-free before our introduction and were separated from the main river by 
additional barriers. 

Each stream was stocked with 38 gravid females and 38 mature males. These fish 

had been collected as juveniles, reared to maturity in single sex groups, and then 
mated in groups of 4-5 males and 4-5 females per breeding group before intro- 
duction. To minimize the potential for founder effects and equalize genetic divers- 
ity in each stream, males and females from each breeding group were introduced 
into alternate streams. Doing so increased the effective population size of each 
population, because females retained the sperm from mating with one set of 
38 males, then were introduced and subsequently mated with a second set of 
38 males. As part of a separate experiment the riparian forest canopy was experi- 
mentally thinned in the Intro2 stream before the introductions*', but the two 
introduction streams were similar in all other respects. 
Laboratory breeding experiments. Laboratory populations used for the gene 
expression assays were second-generation laboratory fish that were originally 
derived from 30 adult females and 30 adult males collected from each of the HP, 
LP and two introduction populations (Intro 1, Intro 2) in March of 2009. This time 
period represented one year or 3-4 generations after the establishment of intro- 
duction populations. Fish were kept in 1.5-] tanks (Aquatic Habitats) connected to 
a custom-made recirculating system and maintained on a 12-h light cycle at 
25+1°C”*3, Fish were reared on standardized food levels adjusted weekly 
for age and number of individuals per tank (morning, Tetramin tropical fish flakes, 
Spectrum Brands, Inc.; afternoon, brine shrimp (nauplii of Artemia spp.), Brine 
Shrimp Direct). The quantity of food offered daily approximated ad libitum and 
was comparable to the high level of food administered in other studies™. 

We reared all wild-caught guppies for two generations under common garden 
conditions using a breeding design that retains the genetic variation of the original 
population, prevents inbreeding, and minimizes maternal and other envir- 
onmental effects**. The first generation (G1) line in the laboratory was derived 
from wild-caught juveniles and reared to maturity in the lab. Wild-caught gravid 
females were housed individually until parturition and their offspring were used to 
create G1 family lines. Females that did not give birth within about 30-35 days of 
capture were randomly crossed with a wild-caught male; however, no two females 
were crossed with the same male. The G1 offspring from each brood were housed 
separately until sexed, and then separated into single-sex tanks. Juvenile females 
(28-56 days) can be identified by the presence of melanophores in a triangular 
patch that appears on their ventral abdomens, which is absent in males*. Sexing 
was accomplished by anaesthetizing guppies in buffered MS-222 (0.85 mg ml’; 


ethyl 3-aminobenzoate methane sulfonic acid salt) (Sigma-Aldrich) and observing 
the melanophores under a microscope. Males are considered to be sexually mature 
when the apical hood grows even with the tip of their gonopodium; females usually 
mature within + 1-2 days of males**. Mature males and females from each family 
line were then randomly chosen and crossed to other families to produce the 
second generation (G2). Each G2 family was the product of a unique cross, to 
minimize inbreeding and maximize the genetic variation within each population. 

Within 24 h of birth, G2 full-sibling broods were randomly assigned to two 1.5-1 
tanks (2-10 full siblings per tank) that differed in exposure to chemical cues from a 
predator (reared with or without cues from a predator) using a split-brood design. 
Siblings reared with cues from predators were reared in recirculating units that 
housed a pike cichlid within the sump that supplied water to the tanks”**?*°, 
Chemical predation cues included both kairomones from the cichlid predator 
and alarm pheromones from the two guppies consumed daily by the cichlid. 
Guppies reared without cues from predators were housed in identical recirculating 
units without predators in the water supply. G2 juveniles were anaesthetized and 
sexed at 29days (see above). From each population, we randomly selected 
5-6 families to raise pairs of male siblings within each rearing treatment. 
RNA-sequencing. Focal animals were euthanized by immersion in ice water 
followed by rapid decapitation (LACUC approved protocol #12-3818A). Whole 
brains were collected by cutting the head sagittally down the centre line and 
removing all brain tissue. Brains were then flash frozen in liquid nitrogen and 
stored at —80°C until further processing. Tissue collection took <2 min per fish, 
fast enough to minimize changes in gene expression due to handling. Whenever 
possible, we combined brains from two full-siblings in the same treatment group 
to ensure we could obtain sufficient RNA for sequencing, while minimizing vari- 
ation among pooled individuals. To minimize temporal and circadian variation, 
we performed all dissections within 15 min after lights-on in the morning (fish 
were all kept on a 12:12 h light-dark cycle). In addition, gene expression levels at 
lights-on minimized expression differences in response to recent experiences. Our 
data thus represent baseline transcription levels. The age of the fish (118-154 days) 
and the timing of sampling were randomly distributed across populations. Because 
all dissections occurred within 15 min, no more than 8 individuals (1-2 families 
distributed in both treatments) could be sampled per day, and the order in which 
populations were sampled was randomized. 

RNA was extracted from whole brain tissue using Qiagen RNAeasy lipid extrac- 
tion kit. A separate sequencing library was prepared for each pooled family, using 
unique index sequences from the Illumina Tru-Seq RNA kit following manufac- 
turer’s instructions. Sequencing libraries were constructed and sequenced on three 
lanes of an Illumina HiSeq 2000 at the HudsonAlpha Genomic Services 
Laboratory (Huntsville, Alabama) in April 2012. In total, 32 samples were 
sequenced in 5 lanes (sample sizes that passed quality filters: n = 5 for HP reared 
with and without predators, n = 4 for Introl and Intro2 reared with and without 
predators, n = 3 for LP reared with predators, and n = 2 reared without preda- 
tors). We obtained 736,693,718 100-base pair (bp) reads that passed the machine 
quality filter, with 17,517,493 to 28,265,561-bp reads per sample, and average 
quality >35.6 for all samples. 

Sequencing reads were mapped to a high-quality brain-specific reference 
transcriptome for P. reticulata. We constructed the reference from a data set 
containing >450 million 100-bp paired-end reads, which were filtered for high- 
quality sequences and normalized in silico to compress the range in k-mer abund- 
ance. We used SeqMan NGEN 4.1.2 to perform the assembly, which contained 
41,347 contigs, N50 = 2,548, and recovered 63% of Tilapia (Oreochromis niloticus) 
Ensembl proteins (Release 70). Contigs from the assembly were annotated by 
blastx queries against SwissProt (database downloaded 6 October 2012), 
UniProt/Trembl (28 November 2012), and nr (11 December 2012). Default para- 
meters were used in the blastx queries, with e-value cut-off of 1 x 10°+. 

Reads were mapped to the reference assembly using Bowtie 2 v2.0.0 on a server 
running Red Hat Enterprise Linux version 6.5. We used a seed size of 20 bp, with 
no mismatches allowed in the seed (run options: -D 15 -R 2 -N 0 -L 20 -iS,1,0.75). 
We retained mappings with quality scores >30 (<0.001 probability that the read 
maps elsewhere in the reference) and kept only contigs represented by =1 count per 
million reads in at least three samples. After removing low-abundance transcripts, 
628,797,716 reads (85.3%) mapped to 37,493 unique contigs in the reference tran- 
scriptome. We used the number of reads mapping to each of those contigs along 
with TMM-normalized library sizes** to analyse differential expression. 

Data analysis. Between-group analysis (BGA) was conducted” as implemented in 
the R package made4 (ref. 37). BGA is a multivariate discriminant approach that is 
appropriate when the number of variables exceeds the number of cases; it is carried 
out by ordinating groups of samples and projecting the individual sample loca- 
tions on the resulting axes. We used principal components analysis (PCA) as the 
ordination method (Fig. 1). To quantify the rate of evolution along the axis 
separating the HP source population from the introduction populations, we cal- 
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culated evolutionary divergence in Haldanes**. We assumed a time of 3.5 genera- 
tions, and used the difference in mean transcript abundance in the no predator 
treatment with a pooled standard deviation** (see Extended Data Figs 2a, b). 

We used random permutation tests to evaluate differential expression across 
populations and treatment groups. Permuted data sets were generated by ran- 
domly reassigning entire RNA-seq samples among population and treatment 
categories to produce an empirical null distribution against which to test hypo- 
theses. This approach preserves any non-independence among transcripts that 
could bias inferences if the non-independence were not taken into account. We 
first computed transcript-specific test statistics from the actual data (see below) 
and compared that statistic to the distribution of the same statistic derived from 
250 permuted data sets. If the statistic for the real data fell within the extreme tails 
of the permuted values for that transcript, we called the transcript differentially 
expressed (DE). To determine if more transcripts were called DE than expected, 
we compared the number of DE transcripts in the real data set to the distribution of 
that number in the 250 permuted data sets. 

To determine if transcripts were significantly evolved in each introduction 
population we restricted the analysis to samples collected from fish reared without 
predator cues. For both the actual and the permuted data sets, a general linear 
model was applied separately to each transcript, with the normalized transformed 
number of reads as the dependent variable and population (HP and Introl or HP 
and Intro2, depending on the analysis) as a fixed effect. We then used general 
linear statistical models to assess divergence in the two introduction populations 
and the natural LP population (that is, HP versus Introl and HP versus Intro2 and 
HP versus LP) for each transcript. If the test statistic for each of the three contrasts 
fell in the extreme 5% of the distribution of the permutation test statistics, and the 
contrasts all had the same sign, we called the transcript concordantly differentially 
expressed (CDE). To calculate the number of transcripts expected to be called CDE 
in the two introduction populations under random expectations, we conducted 
this same analysis in each of the 250 permuted data sets, and calculated the number 
of transcripts meeting the same criteria. This permutation analysis accounts for 
any spurious associations that might result from comparing both introduction 
populations to the same ancestral HP population”. 

To test if the divergence in gene expression is greater than would be expected by 
neutral processes, we calculated Psy (a measure of phenotypic divergence between 
populations) from phenotypic variance components as in ref. 40, assuming 
h? = 0.5 (where h = heritability of a trait) for transcript expression level. This h? 
estimate is substantially higher than the average estimate from a recent analysis in 
sticklebacks™, making our comparison of Psy with published Fer estimates (Fer is 
a measure of genetic divergence between populations) conservative with respect to 
the hypothesis that divergence is greater than expected under genetic drift. 

We assessed the association between evolutionary divergence and ancestral 
plasticity in gene expression by conducting likelihood ratio tests of independence 
and comparing the resulting 7” value to the distribution of 7* values produced by 
conducting the same test on the 250 permuted data sets. Similarly, for the CDE 
transcripts, we calculated the Spearman rank correlation between evolution (mean 
change in expression level between HP and introduction populations in the 
no-predator-cue environment) and plasticity (mean change in expression in the 
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HP ancestral population reared in the two predator-exposure environments), and 
compared that value to the distribution of values obtained from 1,000 random 
permutations of the population and treatment group labels. This permutation 
analysis accounts for any spurious correlation that can result because the calcula- 
tions for evolutionary divergence and plasticity share a common term (mean 
expression level in the HP source population reared without predator cues)”. 

For CDE transcripts, we quantified plasticity in the source population and in the 
introduced populations as the difference in the mean expression values (normal- 
ized log-transformed number of reads mapping to a given transcript) for each 
transcript in the two predator-cue treatment groups within each population. We 
then calculated the change in these plasticity values between the source and intro- 
duction populations and used a nonparametric sign test to determine if that 
change was significant. We evaluated the association between ancestral and des- 
cendant plasticity in the CDE transcripts using a Spearman’s rank correlation, and 
determined significance of that correlation using a random permutation test. 
Starting with the mean expression levels for each transcript within each popu- 
lation/treatment group, we randomly permuted the population/treatment labels 
1,000 times, recalculated ancestral and derived plasticity values for each transcript 
in each permutation, and calculated Spearman’s rank correlation of the permuted 
values. All statistical analyses were implemented in SAS 9.4 (SAS 2011) running in 
a Linux environment. 
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Extended Data Figure 1 | Map of Trinidad where the experimental streams that lacked cichlids and guppies, Introl (left photograph) and Intro2 
transplants took place. Guppies were moved from a high-predation (HP) (right photograph). A naturally occurring guppy population without cichlids, 
locality where they coexist with cichlid predators and introduced into two low-predation (LP), was sampled to provide a low-predation reference. 
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Extended Data Figure 2 | Frequency histogram of Haldanes for the top 500 transcripts loading on PC2—the axis representing rapid evolutionary divergence 
between the source and introduction populations. a, Introl (median Haldane = 0.256, range = 0.07-0.74). b, Intro2 (median = 0.226, range = 0.10-1.68). 
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Extended Data Figure 3 | Ancestral plasticity and evolution in patterns of __ (Introl and Intro2). In this case the plastic response results in a decrease in 
gene expression for a representative gene: uridine phosphorylase 2 (upp2). _ expression, whereas the evolved response in the introduction populations is to 
Shown is the plastic response of the high-predation source population and increase expression, thus illustrating non-adaptive plasticity. 

the evolved responses in the two experimental introduction populations 
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Extended Data Figure 4 | Scatter plot of ancestral plasticity (change in 
transcript abundance to the absence of cichlid predator cues) and popula- 
tion divergence. Shown are the 565 transcripts that exhibited significant 
differences in expression between the predator and non-predator rearing 
treatments in the HP source population. We found a similar pattern as was 
found for the CDE transcripts (Fig. 2): 75% (424 out of 565) of the significantly 
plastic genes exhibited population divergence in the introduction populations 


in the opposite direction of plasticity (y* = 284.2, d.f. = 1). This result falls in 
the upper percentile of the 250 permuted 7’ values; median permuted 

values = 19.1, interquartile range = 6.7-50.8. Only eight transcripts were 
common to the data sets that were significantly evolved (CDE; Figs 2, 3) and 
significantly plastic, suggesting that short-term plastic responses and longer- 
term evolutionary responses involve largely different sets of genes. 
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Extended Data Table 1 | Comparison of gene expression divergence (Ps7) with divergence of putatively neutral microsatellite loci (F57) 


Introl* Intro24 

Pst" 0.32 (0.21) 0.27 (0.21) Only CDE transcripts 
0.05 (0.11) 0.05 (0.12) Only non-CDE transcripts 
0.05 (0.11) 0.05 (0.10) All transcripts 

F 0.01 N/A 10 microsatellite loci 


*Quantitative divergence estimated by Psy, a phenotypic proxy for quantitative genetic divergence Qs7*°, calculated under the conservative assumption that half the within-population variation was heritable. 
Numbers in parentheses are standard deviations. 

Neutral divergence estimated from 10 microsatellite loci*?. 

“Divergence between the ancestral HP site (Guanapo) and the Intro1 site (Lower Lalaja). 

“Divergence between the ancestral HP site (Guanapo) and the Intro2 site (Upper Lalaja). 
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A new cyanogenic metabolite in Arabidopsis 
required for inducible pathogen defence 


Jakub Rajniak', Brenden Barco’, Nicole K. Clay? & Elizabeth S. Sattely' 


Thousands of putative biosynthetic genes in Arabidopsis thaliana 
have no known function, which suggests that there are numerous 
molecules contributing to plant fitness that have not yet been discov- 
ered'”. Prime among these uncharacterized genes are cytochromes 
P450 upregulated in response to pathogens**. Here we start with a 
single pathogen-induced P450 (ref. 5), CYP82C2, and use a combina- 
tion of untargeted metabolomics and coexpression analysis to 
uncover the complete biosynthetic pathway to 4-hydroxyindole-3- 
carbonyl nitrile (4-OH-ICN), a previously unknown Arabidopsis 
metabolite. This metabolite harbours cyanogenic functionality that 
is unprecedented in plants and exceedingly rare in nature’; further- 
more, the aryl cyanohydrin intermediate in the 4-OH-ICN pathway 
reveals a latent capacity for cyanogenic glucoside biosynthesis*” in 
Arabidopsis. By expressing 4-OH-ICN biosynthetic enzymes in 
Saccharomyces cerevisiae and Nicotiana benthamiana, we reconstit- 
ute the complete pathway in vitro and in vivo and validate the func- 
tions of its enzymes. Arabidopsis 4-OH-ICN pathway mutants show 
increased susceptibility to the bacterial pathogen Pseudomonas 
syringae, consistent with a role in inducible pathogen defence. 
Arabidopsis has been the pre-eminent model system’°" for studying 
the role of small molecules in plant innate immunity”; our results 
uncover a new branch of indole metabolism distinct from the canon- 
ical camalexin pathway, and support a role for this pathway in the 
Arabidopsis defence response’’. These results establish a more com- 
plete framework for understanding how the model plant Arabidopsis 
uses small molecules in pathogen defence. 

To identify cytochromes P450 potentially involved in the biosyn- 
thesis of novel defence-associated small molecules, we obtained raw 
data sets for all transcriptomics experiments dealing with biotic stress 
in A. thaliana from the NASCArrays database. We examined CYP 
genes present in the probeset and selected a candidate, CYP82C2, that 
is highly expressed under a variety of pathogen treatment conditions, 
but whose native function in Arabidopsis is unknown (Fig. 1a). 

To identify small molecules whose levels change in a CYP82C2- 
dependent manner, we performed comparative metabolomics™ with 
a homozygous transfer-DNA insertion line of CYP82C2. We used the 
bacterial pathogen P. syringae pv. tomato DC3000 harbouring the 
avrRpm1 avirulence gene (Psta) as an elicitor since CYP82C2 express- 
ion is strongly upregulated 24h after inoculation with this strain 
(Fig. 1a). We analysed tissue methanolic extracts of 11-day-old seed- 
lings grown hydroponically in the presence of Psta by liquid chromato- 
graphy—mass spectrometry (LC-MS), and computationally compared 
mutant and wild-type (WT) Col-0 metabolomes. From this analysis, 
we identified 11 compound mass signals that reproducibly and signifi- 
cantly differ between WT and cyp82C2 (Fig. 1b); these mass ions are 
induced after pathogen elicitation and are not bacterially derived 
(Extended Data Fig. 1a). 

We next sought to obtain clues about the structure of these com- 
pounds from their tandem mass spectra (MS/MS). MS/MS analysis 
revealed that the 11 compounds could be divided into two classes 
(A and B in Fig. 1b), assigned as indole-3-carboxaldehyde (IAL) deri- 


vatives with (B) and without (A) hydroxylated indole systems. 
Moreover, the fact that the cyp82C2 mutant lacked all the hydroxylated 
derivatives but accumulated excess amounts of their non-hydroxylated 
counterparts suggested that CYP82C2 acts as an indolic hydroxylase. 
However, except for compound A1 (Fig. 2b), which was confirmed to 
be indole-3-carboxylic acid methyl ester, the structures of these com- 
pounds remained elusive. 

To facilitate structural analysis, we investigated whether any of these 
compounds were exuded into the medium in the cyp82C2 mutant 
seedling experiments (Fig. 1d). Filtered spent medium was loaded onto 
a C18 silica gel cartridge, and non-polar metabolites were eluted with 
acetonitrile and analysed by LC-MS. Surprisingly, the profile of spent 
medium extracted in this manner was notably different from that 
of tissue methanolic extracts: while small amounts of A2-A7 were 
present, no Al could be detected; instead, a new ultraviolet-active 
compound with m/z = 171.0553 [M+ H]* dominated the LC-MS 
trace (Fig. 1d). NMR analysis of this compound followed by compar- 
ison with a synthetic standard established its identity as the novel 
metabolite indole-3-carbonyl nitrile (ICN) (Fig. lc and Extended 
Data Fig. 2). 

Chemically, the most striking feature of ICN is the presence of a 
highly reactive «-ketonitrile moiety that, to our knowledge, has not 
been found in any plant natural product; however, benzoyl cyanide 
has been previously identified in the secretions of millipedes®’. The 
a-ketonitrile is susceptible to nucleophilic attack, resulting in the dis- 
placement of cyanide ion: in alkaline aqueous solution, ICN degrades 
to indole-3-carboxylic acid (ICA) (an alternative route to ICA in 
Arabidopsis has been reported’); in methanol, ICA methyl ester 
(A1) is formed instead, explaining the presence of A1 and the absence 
of ICN in methanolic extracts (Fig. 1c). Modifying the tissue extraction 
procedure by using an acidified 1:1 acetonitrile/water mixture enabled 
direct detection of ICN by LC-MS; additionally, when deuterated 
methanol was used, only the deuterated form of Al was observed 
(Extended Data Fig. 1b-e). On the basis of its molecular formula and 
the synthesis of an authentic standard, A6 was shown to be a serine- 
ICN addition product (see Fig. 2b). However, in the presence of 
cysteine and structurally related compounds, ICN can undergo a spon- 
taneous cycloaddition, resulting in the formation of a thiazoline ring 
and the net loss of ammonia. This last observation allowed us to deter- 
mine the structures of and synthesize standards for compounds A2- 
A5, which are the cycloaddition products of ICN and cysteine (A4) or 
Cys-Gly dipeptide (A2) and their thiazole analogues (A5 and A3, 
respectively; see Fig. 2b, Extended Data Fig. 3, and Supplementary 
Table 1). 

The absence of the hydroxylated analogues B1-B6 in the cyp82C2 
insertion line pointed to ICN as the likely substrate for this enzyme. 
Incubation of ICN with yeast-expressed CYP82C2 yielded only a trace 
amount of hydroxylated ICN, but a significant amount of 4-hydroxy- 
ICA (4-OH-ICA) (structure shown in Fig. 3), as confirmed by NMR 
spectroscopy and comparison with a synthetic standard (Extended 
Data Fig. 4a—d). Since CYP82C2 shows no activity on ICA, we deduced 
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Figure 1 | Transcriptomic and metabolomic 
analyses implicate CYP82C2 in the biosynthesis 
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that CYP82C2 converts ICN to 4-OH-ICN, competing with hydrolysis 
of ICN to ICA (Extended Data Fig. 4e, f). Further experiments with 
chemically synthesized 4-OH-ICN showed that its half-life is approxi- 
mately 3 min in aqueous solution at pH = 7.5 (Supplementary Table 
2), rendering direct isolation of the 4-OH-ICN product infeasible. 
Chemical synthesis of 4-OH-ICN further enabled the synthesis of 
the 4-hydroxy derivatives of A1-A6, confirming that these correspond 
to compounds B1-B6 seen in WT tissue extracts (Extended Data Fig. 3 
and Supplementary Table 1). Therefore, all the metabolites identified 
in our initial metabolomics experiment with cyp82C2 are ultimately 
derived from ICN, whether as artefacts of the extraction (Al and B1), 
or as in vivo addition products (A2-A7, B2-B6). 

We next investigated the biosynthesis of ICN, using the CYP82C2 
gene as bait for coexpression analysis. For our pathogen data set, the 
CYP79B2 gene, whose encoded enzyme converts tryptophan (Trp) into 
indole-3-acetaldoxime (IAOx)’*, has the second highest correlation 
(Pearson’s r) with CYP82C2 among all genes profiled (Supplementary 
Table 3). We performed a metabolomic analysis of the cyp79B2 cyp79B3 


double knockout line’, which is deficient in [AOx production. No ICN- 
derived metabolites are produced in this mutant (Fig. 2a), indicating that 
ICN is derived from IAOx. 

In searching for the enzyme(s) responsible for further conversion of 
IAOx to ICN, we postulated a biosynthetic route paralleling that of the 
cyanogenic glycoside dhurrin®: a CYP79-catalysed formation of an 
aldoxime, followed by a CYP71-catalysed formation of a cyanohydrin 
intermediate. In the dhurrin pathway, the cyanohydrin is glucosylated 
to yield the final product, whereas in ICN biosynthesis, a final dehy- 
drogenation is required to produce an «-ketonitrile (Fig. 2c). 

Correlation analysis implicated CYP71A12, a P450 linked to cama- 
lexin biosynthesis"’, as the most likely candidate gene for the cyano- 
hydrin formation step (Supplementary Table 3). Profiling of the 
cyp71A12 transfer-DNA insertion line, as well as transfer-DNA inser- 
tion lines of its two closest Arabidopsis homologues, CYP71A13 and 
CYP71A18, demonstrated that the CYP71A12 gene is in fact probably 
responsible: all ICN derivatives with the exception of A6 are at ~10% 
of WT levels in the cyp71A12 mutant, but unaffected in the cyp71A13 


Figure 2 | Targeted metabolic profiling of 
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Figure 3 | In vitro reconstitution of 4-OH-ICN biosynthesis from IAOx. 
Combined extracted ion chromatograms (EICs) for [AOx substrate and 
reaction products for various subsets of enzymes in the 4~-OH-ICN pathway; 
4-OH-ICN could not be detected directly and its hydrolysis product 4-OH-ICA 
is shown instead. 


and cyp71A18 mutants (Fig. 2a). Levels of camalexin and other indolic 
metabolites were only slightly changed in whole-seedling tissue 
extracts of the cyp71A12 mutant (Extended Data Fig. 5c). 

Further correlation analysis using CYP71A12 as bait revealed a clus- 
ter of five tandemly arrayed homologous genes, At1g26380-At1g26420, 
that are highly coexpressed with CYP71A12 (Supplementary Table 3). 
Atlg26380 encodes a flavin-dependent oxidoreductase known as 
FOX] (ref. 19). We profiled the corresponding homozygous transfer- 
DNA insertion lines for these genes and found a three- to fivefold 
reduction in levels of ICN metabolites in the fox1 mutant, with no 
significant changes observed for the other mutants (Fig. 2a). 
Additionally, we observed a build-up of IAL, the expected hydrolysis 


product of the indole-3-cyanohydrin intermediate (Extended Data 
Fig. 5d). More strikingly, the fox] mutant accumulates new mass 
signals corresponding to indole cyanogenic glycosides (ICGs), not 
previously observed in plants (Extended Data Fig. 6a-e, structures 
shown in Fig. 4d). Cyanogenic glycoside compounds are widely dis- 
tributed in the plant kingdom, but have not yet been detected in 
Arabidopsis°. Disruption of the ICN pathway at the FOX1-catalysed 
step therefore leads to capture of some portion of the cyanohydrin 
intermediate by non-specific glycosyltransferases, exactly paralleling 
dhurrin synthesis®. 

We sought to confirm the proposed biochemical transformations 
(Fig. 2c) by reconstituting the complete pathway in vitro. A combination 
of yeast microsomal CYP71A12 and CYP82C2 and N. benthamiana- 
expressed FOX1 was sufficient to catalyse the conversion of [AOx to 
ICN, as illustrated in Fig. 3; the production of 4-OH-ICN is inferred 
from the accumulation of 4-OH-ICA. We also reconstituted the bio- 
synthesis of 4-OH-ICN in the heterologous host N. benthamiana, 
using transient expression of the four pathway genes necessary for 
production of 4-OH-ICN from Trp via Agrobacterium-mediated 
transient transformation”. We observed significant accumulation of 
B1 (from methanol extraction of 4-OH-ICN) only when all pathway 
genes were present; however, we also noted background levels of ICA 
and IAL when only early pathway genes were expressed (Extended 
Data Fig. 7). Notably, when we expressed CYP79B2 and CYP71A12 
but not FOX1, we again observed the accumulation of ICG mass 
signals (Extended Data Fig. 6f). 

The Trp-derived metabolites camalexin and 4-methoxy indol-3- 
ylmethylglucosinolate (4-methoxyglucobrassicin) have been shown 
to play a key role in Arabidopsis immunity (Fig. 4d)'°''"***. To evalu- 
ate whether 4-OH-ICN pathway products also contribute to 
Arabidopsis disease resistance, we challenged 4-OH-ICN biosynthetic 
mutants with a diverse panel of pathogens. Using surface inoculation 
to mimic the natural infection process, we found that, compared with 
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Figure 4 | Camalexin and CYP82C2-synthesized 4-OH-ICN contribute 
non-redundantly to disease resistance against the virulent bacterial 
pathogen P. syringae. a, Growth analysis of the virulent P. syringae pv. tomato 
DC3000 (Pst) in surface-inoculated adult leaves. Data represent the 

mean + s.e.m. of four biological replicates. Data points labelled with different 
letters are significantly different (P < 0.05, two-tailed f test); data points labelled 
with the same letter are not significantly different. WT, Col-0 ecotype; c.fiu., 
colony-forming units. b, Growth analysis of Pst in 10-day-old seedlings pre- 
treated with water or 1 1M bacterial MAMP flg22 for 6h. Data represent the 
median + s.e.m. of four biological replicates of 10-15 seedlings each. Different 
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letters denote statistically significant differences (P < 0.05, two-tailed f test). 
c, Growth analysis of Pst in WT adult leaves pre-immunized with 1 1M 

flg22 and 100 uM ICN, 4-OH-ICN, camalexin or solvent control 
(dimethylsulfoxide (DMSO)) for 24h before infiltration with Pst. Data 
represent the median + s.e.m. of three biological replicates. Asterisk denotes 
statistical significance relative to WT (P < 0.01, two-tailed f test). Experiment 
was repeated three times, producing similar results. d, Summary of known 
major Trp-derived secondary metabolites in Arabidopsis and 

oxidative biosynthetic enzymes that have been used to reconstitute the 
pathways in vitro or in planta. 
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WT, the adult leaves of cyp71A12 and cyp82C2 are more susceptible to 
the virulent bacterial hemibiotroph Pst (P. syringae pv. tomato 
DC3000) and comparable to the immuno-deficient fls2 mutant, which 
cannot perceive the bacterial microbe-associated molecular pattern 
(MAMP) flg22 (refs 22, 23) (Fig. 4a). Similarly, seedlings of the 
4-OH-ICN pathway mutants are more susceptible to Pst than WT 
in the presence and absence of flg22 (Fig. 4b), indicating a role for 
4-OH-ICN in basal disease resistance against a bacterial pathogen. 
Notably, the adult leaves and seedlings of the camalexin pathway 
mutants cyp71A13 and pad3 are also more susceptible to Pst infection 
than WT (Fig. 4a, b), suggesting a previously unrecognized role for 
camalexin in the antibacterial defence response. To test for a direct role 
of the ICN pathway metabolites in the plant innate immune response, 
either as inducible antibacterial or signalling compounds, we mea- 
sured their protective effect against subsequent bacterial infection by 
infecting WT adult leaves with Pst after pre-immunizing them with 
pure compounds and flg22. Compared with a solvent control, pre- 
treatment with 4-OH-ICN (but not ICN or camalexin) conferred 
greater bacterial resistance (Fig. 4c), which supports a direct mech- 
anism of action for 4-OH-ICN in inducible plant defence. 

We also observed increased disease symptoms in adult leaves of 
the cyp82C2 mutant upon inoculation with spores from the avirulent 
fungal necrotroph Alternaria brassicicola (Extended Data Fig. 8e, f) 
and—consistent with a previous report**—the virulent necrotroph 
Botrytis cinerea (Extended Data Fig. 8a, b), but not from the obligate 
fungal biotroph Golovinomyces orontii (Extended Data Fig. 8c, d). 
Furthermore, purified ICN and 4-OH-ICN have a growth inhibitory 
effect on B. cinerea and A. brassicicola comparable to that of cama- 
lexin’®* (Extended Data Fig. 9). However, we cannot rule out the pos- 
sibility that the role of the 4-OH-ICN pathway in fungal defence is 
indirect, as adult leaves of the cyp82C2 mutant appear partly impaired 
in camalexin production after Alternaria treatment (Extended Data 
Fig. 10). 

The camalexin and 4-OH-ICN pathways rely on a pair of paralo- 
gous genes, CYP71A12 and CYP71A13, which are members of the 
CYP71 family linked to innovations in plant metabolism” (Fig. 4d). 
Strikingly, the 4-OH-ICN pathway resembles the widespread cyano- 
genic glucoside pathway that has been lost in the Brassicaceae, and 
appears to be a metabolic re-invention leading to a novel cyanogenic 
metabolite type derived from Trp’. It is possible that 4-OH-ICN acts 
in concert with other Trp-derived metabolites, each contributing to 
protection against overlapping sets of specific pathogens. Collectively, 
our data provide additional insight into the Arabidopsis defence res- 
ponse and, more generally, how plants use metabolic innovation to 
expand innate immunity. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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Extended Data Figure 1 | Elicitation of compounds identified in 
metabolomics screen by flg22 peptide and origin of ICA methyl ester as 
artefact of the methanol extraction method. a, Levels of compounds in 
Flg22-elicited Arabidopsis Col-0 seedling tissue, quantified as mean [M + H]* 
ion (m/z = 10 ppm) abundances extracted from raw data; error bars, s.d. based 
on three biological replicates. Production of these compounds in axenic 
plant culture demonstrates that they are plant-derived. b, c, Structure and mass 
peaks of ICA methyl ester (compound A1) seen in LC-MS analysis (b), and 
EICs for the expected m/z using a standard extraction with 80:20 


LETTER 


CH;3OH/H,0 or with 80:20 CD30D/D,0 (c). d, e, Structure and mass 
spectrum peaks seen for the triply deuterated A1 analogue (d), and EICs for the 
expected m/z using extraction with 80:20 CH30H/H,0, or with 80:20 CD30D/ 
D,0 (all EICs are to scale) (e). The presence of the deuterated analogue of ICA 
methyl ester and the complete absence of the non-deuterated compound in 
plant extracts when CD3OD is substituted for CH,OH show that the methyl 
ester is not a product of Arabidopsis metabolism, but arises because of the 
extraction method as a degradation product of ICN. 
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Extended Data Figure 2 | Comparison of spectra for plant-extracted and 
synthetic compound establishes identity of ICN as new indolic metabolite 
produced by A. thaliana. a, Full-range (6 = 10.5 to —0.5) and, b, downfield 
region partial (5 = 8.5-7.0) 'H NMR spectra in CD3CN. Upfield contaminants 
in the full-range spectra are presumed to be residual solvent. c, Ultraviolet- 
visible absorbance spectra obtained via a diode array detector during LC 


LETTER 


analysis. Note that the prominent peak at 230 nm is due to acetonitrile in 

the LC mobile phase. d, Targeted MS/MS spectra for the parent ICN [M + H]* 
ion (m/z = 171.0550) at a collision energy of 20 V. See Supplementary Table 1 
for relative peak intensities at other collision energies. e, Aligned EICs for 
the ICN [M+ H]" ion for a Col-0 + Psta tissue sample extracted with DMSO 
and synthetic compound, showing identical retention times. 
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Extended Data Figure 3 | Comparison of plant-extracted ICN derivatives, 
4-OH-ICN derivatives, and synthetic standards shows identical column 
elution times for all compounds. Col-0 + Psta combined EICs were extracted 


sample (all other traces), while synthetic EICs were extracted for a mixed 
standard in DMSO. Note that chromatograms are not to scale, and the 
synthetic standard is not equimolar with respect to all compounds because of 


for the relevant compound [M + H]* m/z values for a DMSO-extracted 
medium sample (4-OH-ICN trace), or a MeOH-extracted seedling tissue 


partial degradation. 
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Extended Data Figure 4 | CYP82C2 is an ICN 4-hydroxylase. a, b, 'H NMR 
spectra in CD3OD of synthetic ICA (a) and 4-OH-ICA (b). ¢, Spectrum for 
large-scale enzymatic reaction extract of ICN incubated with CYP82C2. In 
addition to ICA, resulting from hydrolysis of ICN, peaks for a singly 
hydroxylated analogue of ICA are seen; these are qualitatively consistent with, 
but shifted slightly upfield (~30-60 Hz) from the 4-OH-ICA spectrum, 
possibly because of impurities or a pH effect in the enzymatic reaction sample. 
d, To confirm the identity conclusively, 80 jig of 4-OH-ICA dissolved in 
CD3OD was added to the enzymatic reaction NMR sample before acquiring 
another spectrum: no new peaks are seen, while the prior hydroxylated ICA 


peaks grow in intensity, establishing the product of the enzymatic reaction as 
4-OH-ICA. e, EICs for enzymatic reactions of CYP82C2 on ICN or ICA, or 
empty vector control incubation with ICN. Only trace amounts of the expected 
4-OH-ICN product but significant amounts of 4-OH-ICA are seen for the 
CYP82C2/ICN reaction. No hydroxylated products are seen for the CYP82C2/ 
ICA or empty vector/ICN reactions, indicating that CYP82C2 catalyses 

only the hydroxylation of ICN to 4-OH-ICN, but 4-OH-ICA is seen as 

the predominant end product due to rapid hydrolysis of 4-OH-ICN (f). 
Chromatograms in this figure were obtained using the 20 min LC-MS gradient 
(see Supplementary Information, Methods section 1.9 LC-MS analysis). 
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Extended Data Figure 5 | Levels of numerous Arabidopsis indolic condition, quantified as [M + H]* ion abundances by LC-MS analysis with 
metabolites are altered in ICN pathway gene insertion lines compared with | XCMS processing, to levels in Psta-treated WT Arabidopsis seedlings. In 

WT plants. a-e, Relative compound levels for mock treatment condition f, absolute levels for all compounds except RA were quantified by measuring 
and indicated pathway insertion line mutants, and, f, absolute levels in Psta- [M + H]* ion abundances and comparing to standard curves. Error bars, s.d., 
treated WT (Col-0) seedlings. For a-e, data bars represent a logarithmically based on six biological replicates. Cam, camalexin; RA, raphanusamic acid; 
scaled ratio of mean metabolite levels in the indicated line or treatment other abbreviations as detailed previously. 
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Extended Data Figure 6 | Putative ICGs observed in Arabidopsis and in 
N. benthamiana expressing ICN pathway enzymes. a, EICs for putative 
ICGs in WT Arabidopsis and fox mutant elicited with Psta. The m/z values 
shown are median values calculated by XCMS. b, Hypothesized structures and 
theoretical m/z values for the two ICGs identified. c, MS/MS spectrum for 
ICG1; m/z values and relative abundances are shown above each peak. The ion 
analysed here (m/z = 691.2210) represents a [2M + Na]* dimer that is 
significantly more abundant than the [M + Na]* ion. Direct analysis of the 
[M + Na]* ion (m/z = 357.1057) yielded low abundance spectra that could 
not be easily analysed. At lower collision energies, the [2M + Na]* ion 
fragments to [M + Na] + but yields a rich spectrum at 40 V, which is shown. 


Predicted peak assignments for the ICG1 MS/MS spectrum are shown in the 
accompanying table. For peaks in bold, exact counterparts could be identified in 
the dhurrin [M + Na]* 20 V MS/MS spectrum in the METLIN metabolite 
database. d, MS/MS spectrum obtained for the ICG2 [M + Na]* ion and 
predicted peak assignments. While the [2M + Na]* peak (m/z = 864.2225) is 
also seen for this compound (not shown), [M + Na]* is more abundant in 
this case, and was analysed directly. e, f, Levels of ICG1 and ICG2 in ICN 
pathway mutants (e) and in WT plants elicited with Psta and N. benthamiana 
expressing ICN pathway enzymes (f). For e and f, levels are quantified as mean 
[M + Na]~ ion (m/z + 10 ppm) abundances extracted from raw data; error 
bars, s.d., based on six biological replicates. 
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Extended Data Figure 7 | ICN pathway metabolites are produced in set of transiently expressed genes is indicated for each panel. Background levels 
N. benthamiana transiently expressing pathway genes. Levels of ICN and of ICA and IAL detected when only the early pathway genes CYP71A 12 and/or 
4-OH-ICN derivatives (left axis) and other relevant indolic compounds CYP79B2 are expressed indicate potential involvement of endogenous N. 
(right axis), quantified as mean [M + H] * ion (m/z + 10 ppm) abundances benthamiana enzymes. 


extracted from raw data; error bars, s.d., based on six biological replicates. The 
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Extended Data Figure 8 | ICN pathway metabolites contribute to disease 
resistance towards B. cinerea but not towards G. orontii. a, Top: typical 
lactophenol trypan blue staining of leaves drop-inoculated with spores from the 
virulent fungal necrotroph B. cinerea to visualize the extent of host cell death 
(darkly stained areas within and beyond the fungal spore droplet region). 
Middle: microscopic analysis of stained leaves to visualize the extent of fungal 
colonization (stained filamentous fungal hyphae within and beyond the 
fungal spore droplet region). Images were taken at the same magnification 
(X25) and are representative of five biological replicates. Bottom: close-up 
images of the fungal hyphae beyond the fungal spore droplet region for cyp82C2 
and cyp71A13 mutants. Images were taken at the same magnification (< 100). 
b, Measurement of the disease lesion diameters in infected leaves. Data 
represent the median + s.e.m. for five biological replicates. Asterisks denote 
statistical significance relative to WT (P < 0.05, two-tailed t test). c, Typical 
lactophenol trypan blue staining of fungal conidiophores (spore-bearing 
structures) formed in leaves infected with the adapted powdery mildew 


G. orontii. The pad4-1 mutant is more susceptible to fungal growth by G. orontii 
and thus produces significantly more conidiophores. Images were taken at the 
same magnification (X 100) and are representative of three biological replicates. 
d, Measurement of the number of conidiophores in infected leaves. Data 
represent the mean + s.d. for three biological replicates. e, Top: typical disease 
symptoms 3 days after drop inoculation of leaves with spores from the avirulent 
fungal necrotroph A. brassicicola. Bottom: microscopic analysis of infected 
leaves after lactophenol trypan blue staining confirming that disease symptoms 
are consistent with extent of fungal colonization (lightly stained fungal hyphae 
extending from the fungal spore droplet region) and host cell death (darkly 
stained areas along and beyond the border of the spore droplet region). Images 
were taken at the same magnification (X25) and are representative of ten 
biological replicates. f, Measurement of the disease lesion diameters in infected 
leaves. Data represent the median + s.e.m. of eight (top graph) or ten biological 
replicates (bottom graph). Different letters denote statistically significant 
differences (P < 0.05, two-tailed t test). 
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Extended Data Figure 9 | ICN and 4-OH-ICN but not their degradation 
products inhibit fungal growth in vitro. a, b, Fungal growth inhibition assays 
on B. cinerea SF1 (a) or A. brassicicola FSU218 (b) with the tested compound 
(or compound combination) indicated. For compound combinations, the 
concentration indicated is for each compound; the given combinations 
approximate the hydrolysis products of ICN or 4-OH-ICN. Growth of fungi in 
potato dextrose broth on a microplate was quantified by measuring absorbance 


at 600 nm (OD¢00 nm) 72h after spore inoculation and subtracting the 
absorbance at 0 h; see Methods for further details. Error bars, s.d. based on three 
biological replicates. Note that the half-maximum inhibitory concentrations 
(ICs9) for both camalexin and ICN are approximately 25 uM against B. cinerea 
and 50 uM against A. brassicicola. For 4-OH-ICN, the inhibitory effect is not as 
pronounced, possibly because of rapid degradation of 4-OH-ICN in potato 
dextrose broth (see Supplementary Table 2). 
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Extended Data Figure 10 | Levels of indolic compounds in leaves of mature 
plants after mock treatment or fungal infection. Tissue extracts were 

analysed by LC-MS 7 days post-infection for A. brassicicola FSU218 and 5 days 
post-infection for B. cinerea SF1. a-e, Levels of indicated compound, quantified 
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as EIC integral for the [M + H] * ion (m/z + 10 ppm) and converted to absolute 
amounts by comparison with a standard curve. f, Ion count integrals for 
indole glucosinolates ([M-H] ion, m/z + 10 ppm). Error bars in all panels, s.d. 
based on six biological replicates. 
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Erosion of the chronic myeloid leukaemia 
stem cell pool by PPARy agonists 
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Whether cancer is maintained by a small number of stem cells or is 
composed of proliferating cells with approximate phenotypic equi- 
valency is a central question in cancer biology’. In the stem cell 
hypothesis, relapse after treatment may occur by failure to erad- 
icate cancer stem cells. Chronic myeloid leukaemia (CML) is quint- 
essential to this hypothesis. CML is a myeloproliferative disorder 
that results from dysregulated tyrosine kinase activity of the fusion 
oncoprotein BCR-ABL’. During the chronic phase, this sole genetic 
abnormality (chromosomal translocation Ph*: t(9;22) (q34;q11)) at 
the stem cell level causes increased proliferation of myeloid cells 
without loss of their capacity to differentiate. Without treatment, 
most patients progress to the blast phase when additional oncogenic 
mutations result in a fatal acute leukaemia made of proliferating 
immature cells. Imatinib mesylate and other tyrosine kinase inhibi- 
tors (TKIs) that target the kinase activity of BCR-ABL have 
improved patient survival markedly. However, fewer than 10% of 
patients reach the stage of complete molecular response (CMR), 
defined as the point when BCR-ABL transcripts become undetect- 
able in blood cells’. Failure to reach CMR results from the inability 
of TKIs to eradicate quiescent CML leukaemia stem cells (LSCs)”*. 
Here we show that the residual CML LSC pool can be gradually 
purged by the glitazones, antidiabetic drugs that are agonists of 
peroxisome proliferator-activated receptor-y (PPARy). We found 
that activation of PPARy by the glitazones decreases expression of 
STAT5 and its downstream targets HIF2a° and CITED2°, which are 
key guardians of the quiescence and stemness of CML LSCs. When 
pioglitazone was given temporarily to three CML patients in chronic 
residual disease in spite of continuous treatment with imatinib, all of 
them achieved sustained CMR, up to 4.7 years after withdrawal of 
pioglitazone. This suggests that clinically relevant cancer eradication 
may become a generally attainable goal by combination therapy that 
erodes the cancer stem cell pool. 

Cell division tracking with carboxyfluorescein diacetate-succinimi- 
dyl ester (CFSE) indicates that non-cycling CML cells are poorly sens- 
itive to TKIs”* and that the quiescent TKI-resistant subpopulation is 
enriched in CD34*38° cells’. CML LSCs are hence similar to normal 
quiescent haematopoietic stem cells (HSCs), although they are cyto- 
kine-independent’. Because failure to reach CMR occurs even when 
BCR-ABL remains sensitive to TKIs’, we searched for possible ‘non- 
oncogene addiction (NOA)’ of CMLLSCs as a novel therapeutic target. 
NOA indicates that a given malignant cell is abnormally sensitive to 
quantitative variations in an otherwise normal molecular pathway’®. 

We previously reported that the Nef proteins of the immuno- 
deficiency viruses impair haematopoiesis by activating peroxisome 


proliferator-activated receptor gamma (PPARy)'’. This effect was 
reproduced by the thiazolidinediones, a class of synthetic PPARy ligands 
(Extended Data Fig. 1a), although it is compensated in individuals with 
otherwise normal haematopoiesis’”. We then became intrigued with our 
observation that the CML cell line K562 is particularly sensitive to Nef 
and thiazolidinediones''. The involvement of PPARy was also more 
recently reported in haematopoietic stress response’’. 

We turned to a cohort of 29 chronic phase (CP) CML patients at 
diagnosis whose CD34" cells were >95% Ph*. Combining imatinib 
and pioglitazone showed evidence of synergy with a decrease in the 
number of colony-forming cells (CFC) sixfold more pronounced 
(P <0.0001) than with imatinib alone (Extended Data Fig. 2a). A 
similar trend was observed when normal CD34" cells were transduced 
with a lentiviral vector expressing p210 BCR-ABL (Extended Data 
Fig. 2b). Whereas imatinib alone was unable to reduce significantly 
the frequency of CP-CML long term culture-initiating cells (LTC-ICs) 
(P = 0.067), we found that pioglitazone was able to do so, either as a 
single agent by 2.4-fold (P = 0.008) or with an improved effect by 3.5- 
fold in the presence of imatinib (P < 0.001) (Fig. 1a, b). Similar results 
were obtained with the second generation TKI dasatinib or with 
another thiazolidinedione, rosiglitazone (Extended Data Fig. 2c, d). 

CFSE assays were then performed with CP-CML CD34" cells in the 
absence of cytokines (Fig. 1c-e and Extended Data Table 1). Untreated 
control CP-CML CD34" cells proliferated and differentiated actively. 
Imatinib exposure resulted in the elimination of actively dividing cells 
but also in the accumulation of viable CFSE-bright CD34" cells that 
never divided (‘P’) or had divided only once (Fig. 1d). Pioglitazone 
alone was less effective than imatinib to deplete the bulk of dividing 
CML cells but triggered exit from quiescence (Fig. lc-e and 
Extended Data Table 1). Combining pioglitazone with either imatinib 
or dasatinib acted in synergy to deplete both proliferating and non- 
proliferating cells (Fig. 1c-e, Extended Data Table 1 and Extended 
Data Fig. 2e). Imatinib alone was effective at decreasing the number 
of Ph* CD34* CD38" progenitors but failed to reduce the more 
immature CD34* CD38” population, opposite to pioglitazone alone 
(Extended Data Fig. 3b). 

We then investigated the possible molecular pathways that mediate 
pioglitazone activity against CML LSCs. We previously reported that 
PPARy is a negative transcriptional regulator of STAT5 (A and B)"’. 
STATS5 is known to be critical for maintenance and fitness of both 
normal HSCs" and CML cells, where STATS is activated upon direct 
phosphorylation by the BCR-ABL kinase’*. STAT5 expression levels 
were abnormally high in both total CP-CML CD34” cells and qui- 
escent LSC (Fig. 2a). In CFSE-bright cells (that is, P and 1 division of 
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Figure 1 | Pioglitazone purges quiescent CML stem cells. a, Limited dilution 
analysis (LDA) of CML LSCs by LTC-IC assay. Pio, pioglitazone. b, LTC-IC 
frequencies calculated from a (n = 4). c, CFSE analysis (patient 4) after liquid 
culture in serum-free medium without cytokines. P (red), colcemid arrested 
‘parent-cells’. d, Distribution (%) of CD34" cells in each mitosis peak shown in 
c. e, Identical culture conditions as in c, but for patient 2. Left scale (black) 
and histograms show cell counts. Right scale (red) and red dots and lines show 
the number of undivided CD34* cells (P in CFSE assay). Also see Extended 
Data Table 1 (m = 6). See statistics in Methods. 


CP-CML CD34* cells) purified at 14 days of culture without 
cytokines, STAT5B messenger RNA levels decreased by 8.5-fold 
(P <0.0001), 1.5-fold (P= 0.08) and 10.5-fold (P< 0.0001) in the 
presence of pioglitazone, imatinib and the drug combination, respect- 
ively (Fig. 2a). Similar values were obtained for STAT5A (not shown). 

We then compared mRNA levels of four known STATS targets 
genes. Addition of pioglitazone to imatinib significantly reduced 
expression of BCL2L1 (also known as BCL-X,)'* (3.3-fold), BCL2'° 
(4.8-fold), PIM1'’ (1.6-fold) and CISH (also known as CIS)'® (1.6-fold), 
thus suggesting that imatinib alone is not able to inhibit STATS 
transcriptional activity to completion (Fig. 2b). Supplementary studies 
with the bromodomain inhibitor JQ1” confirmed the pivotal role 
played by STATS in CML LSCs (Supplementary Data and Extended 
Data Fig. 2f). 

The effect of pioglitazone was negated by a short interfering RNA 
against PPARy (also known as PPARG) mRNA (Fig. 2c and control 
Extended Data Fig. 1b, c). Decreased clonogenicity of CP-CML 
CD34 cells in the presence of pioglitazone was abolished when 
STAT5B was overexpressed after lentiviral transfer (Fig. 2d). We 
observed that imatinib acts rapidly (in minutes) by preventing 
STATS phosphorylation, whereas pioglitazone acts slowly (in days) 
by decreasing STATS protein levels (Extended Data Fig. 3a, c and d), 
owing to the known long half-life of STATS protein in spite of the 
rapid decrease in its mRNA levels’*. Pioglitazone activity in CFSE 
assays with CP-CML cells was abrogated when STATS5B was over- 
expressed by lentiviral transfer (Fig. 2e, Extended Data Fig. 4a-e and 
Extended Data Fig. 1c). Importantly, pioglitazone was found to be 
more inhibitory/cytotoxic for CML LTC-IC than for normal LTC- 
IC (Extended Data Fig. 5). 
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Figure 2 | Pioglitazone targets the PPARy-STAT5 pathway in CML LSCs. 
a, Normalized STAT5B quantitative PCR with reverse transcription 
(RT-qPCR) on CFSE-bright cells (that is, P and 1 division) at 14 days of culture. 
b, Percent mRNA expression of STATS target genes in CFSE bright cells 
after drug exposure. c, CP-CML CD34" cells cultured with an anti-PPARy 
siRNA before RT-qPCR. d, Colony-forming cell (CFC) assays with CP-CML 
CD34* cells after transduction with enhanced green fluorescent protein- 
(eGFP; negative control) or a STAT5B-expressing lentivectors (Lv). e, Absolute 
cell count together with CFSE analysis (patient 2 in triplicate). Red dots, 
undivided CD34" cells (P in CFSE assay). Data show means + s.d., n = 5. See 
statistics in Methods. 


We then examined, in 7-day cultures without cytokines of CP-CML 
CD34" cells from 11 patients, mRNA expression levels for 9 putative 
downstream transcriptional targets of STATS and/or PPARy (Fig. 3a). 
These included OCTI (also known as POU2F1)°, PML, SIRT1, 
ALOX5, STAT3, MDR1 (also known as ABCB1), GLUT1 (also known 
as SLC2A1), B-catenin (also known as CTNNB1) and HIF2a (also 
known as EPAS1)'’. CD36, known to be upregulated by PPARy ago- 
nists, was used as a positive control’’. Only OCT1 and HIF 2a express- 
ion levels were significantly altered after culture in the presence of 
pioglitazone + imatinib versus imatinib alone (Fig. 3a). Although 
upregulation of OCT1 expression may increase the cellular uptake of 
imatinib”, we found that erosion of the CP-CML LSC pool was not 
improved in the presence of imatinib alone when OCT1 was over- 
expressed by lentiviral transfer (Extended Data Fig. 6). 

In contrast to OCT1, HIF2x was downregulated by pioglitazone 
(Fig. 3a). Importantly, HIF2« and to a lesser degree HIF1« were found 
upregulated in imatinib-resistant CFSE-bright cells (P and 1 cell divi- 
sion), while pioglitazone counteracted this phenomenon (Fig. 3b). 
CITED2, a key gene of HSC “stemness”*, known to be upregulated 
by HIF2a° followed the same trend (Fig. 3b). The viability of undivided 
and imatinib-resistant CP-CML CD34 cells required HIF2« express- 
ion (Fig. 3c, Extended Data Fig. 7a—e and Extended Data Fig. 1c), and 
forced expression of HIF2o in normal human cord blood CD34" cells 
increased the compartment of quiescent cells (Fig. 3d, Extended Data 
Fig. 7f-i). Furthermore, forced expression of BCR-ABL in normal 
human CD34* cells induced HIF2« in a STAT5-dependent manner 
(Extended Data Fig. 8a). Similarly, forced expression of constitutively 
active mutants of STAT5A or STAT5B also induced HIF 2a expression 
(Extended Data Fig. 8b). In both cases, induction of HIF2« was assoc- 
iated with upregulated expression of its target gene, CITED2 (Extended 


17 SEPTEMBER 2015 | VOL 525 | NATURE | 381 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


a 40, P=0.016 b Purified CFSE-bright cells 
8 P= 0.049 my Imatinib = Hibte 
= at 

6 Imatinib + Pio 6 Ga CITED2 


is 


: E itl i i i eLIL| 


: DD 
P=0.018 
0 


MDR1 B-CAT 


(relative to untreated) 


mRNA fold amplification 
nD 


mRNA fold amplification 
(relative to imatinib alone) 
no 


alt 


CD36 PML1 | ALOXS5 imatinib 
OCT1 SIRT1  STAT3 GLUT! ~—-HIF2a Pio a 2 
c d cD34+ 
- +400 
(« 10°) }- 400 (109) 7 fj cps4 
& CD34* 
15 | c 
Mcose | os 30 Lg 
® a ‘fi S 
g = = = 
se 4 = 3 a 
x ge L300 & 
N +300 BON 4g e = 
% 10 Q 8 8 
2 oS x2) g 
oO L BR ay - rd 
8 a 8 ¢ 
m | B fm 1° @ 
Pal loo 2 A +200 9 
oO 54 e 2 Oo = 
N 
ee [ a 5 L a 
2 < 
% ry 
il | [| + 100 100 
siRNA Ctrl - = + ne 4 LvGFP + - 
siRNA HIF2a  — - + - - ee = 
siRNA STAT5 — +o - Ges eae = LvHIF2a - # 
Imatinib = - - + + ne a - 
Pio - a a = = St ab + 


Figure 3 | Expression of target genes in CP-CML cells exposed to 
pioglitazone and imatinib. a, RT-qPCR assays on CP-CML CD34" cells after 
7 days without serum or cytokines. Data show means +s.d., n = 11. b, RT- 
qPCR assays in purified 12-14 days CFSE-bright cells (that is, P and 1 division). 
Data show means +s.d., n = 6. c, Absolute cell count together with CFSE 
analysis of representative CP-CML patient 8 (triplicate). Undivided cells (red 
dots). Also see Extended Data Fig. 7 and Extended Data Fig. 1c. d, Same as in 
c, but after transduction of cord blood CD34" cells with lentivectors (Lv) 
(triplicate). Undivided cells (red dots). See statistics in Methods. 


Data Fig. 8a, b). Because CITED2 is a known master gene of HSC 
quiescence that regulates stemness-associated genes such as BMI1”, 
HES1* and p57 (also known as CDKN1C)”’, we studied the expression 
of these genes in CD34" cells from CP-CML patients and in murine Ba/ 
F3 cell lines we generated to express, by means of retroviral vector 
transduction and ubiquitous promoter driven expression, the consti- 
tutively active forms of murine Stat5a or Stat5b 1*6 (H299R, S711F). 
After 10 days of culture in the presence of imatinib, TKI-resistant 
CD34" cells from CP-CML patients showed an increase in endogenous 
expression of both CITED2 itself (4.5-fold) and the known CITED2 
target genes BMI1 (2.8-fold), HES1 (3.1-fold) and p57 (16.5-fold). 
Addition of pioglitazone fully counteracted said increase in CITED2, 
BMI] and HES] expression and reduced the increase in p57 expression 
by fourfold (Extended Data Fig. 9a). Ba/F3 cell studies corroborated this 
evidence (Extended Data Fig. 9b and c). Taken together, we propose 
here that the CML-LSC is critically dependent (NOA) on a PPARy- 
STAT5-HIF2a-CITED2 pathway, directly and effectively inhibited by 
pioglitazone (Fig. 4a), thus extending the contention that equivalent 
murine leukaemias are addicted (NOA) to STAT5 (ref. 15). 

Because mouse models are poorly suited to investigate CML LSCs* 
and pioglitazone is an approved drug for the treatment of diabetes 
mellitus type 2 in humans, we initially sought to validate pioglitazone 
directly on two patients diagnosed with both diabetes and CML who 
never reached CMR in spite of long-term imatinib treatment. Before 
filing a formal clinical trial application, we prescribed pioglitazone off- 
label and under approved informed consent to a third CML patient, 
this time non-diabetic, who never reached CMR either under long- 
term imatinib therapy (Fig. 4b). 

Pioglitazone was added to the treatment after 5, 6 and 4 years of 
uninterrupted imatinib therapy for patients 1, 2 and 3, respectively. 
None of the 3 patients ever reached CMR before introduction of 
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Figure 4 | Pioglitazone induces complete and sustained molecular response 
(CMR) in CML patients. a, Model of CML LSC addiction to the PPARy- 
STAT5-HIF2« pathway. (Top insert) for the bulk of dividing CML cells, 
imatinib (1) alone is able to bring phopho-STATS levels below a threshold 
(dotted line) at which apoptosis occurs (cross). For CML LSCs, only the 
combination of imatinib and pioglitazone (I+ P) is able to bring cells below a 
threshold at which cells leave their state of quiescence before undergoing 
apoptosis. b, RT-qPCR assays for BCR-ABL/ABL on nucleated blood cells from 
the first three patients. See Supplementary Information for details. 


pioglitazone. Pioglitazone was added to the treatment of patient 1 
during two brief exposures of 10 and 8 months each with an interval 
of 28 months (Fig. 4b). CMR was achieved 10 months after initial 
pioglitazone addition, and patient 1 has remained in CMR for at least 
56 months, the last time-point collected for this study, which is 53 
months (4.5 years) after first stopping pioglitazone administration 
(Fig. 4b). For patient 2, CMR was obtained after 1 year of pioglitazone 
addition and maintained for 32 months at which time they withdrew 
(Fig. 4b). For patient 3, CMR was achieved after 6 months of piogli- 
tazone addition. At this time point, the level of STAT5 mRNA in 
CD34™ cells from the bone marrow of patient 3 was decreased by 
11.9 fold. Patient 3 has remained in CMR for at least 38 months, the 
last time-point collected for this study, which is 28 months after stop- 
ping pioglitazone administration (Fig. 4b). Furthermore, patient 3 
decided to stop imatinib for the last 6 months of the aforementioned 
observation period and has remained in CMR during this period with- 
out any treatment (Fig. 4b and Supplementary Data). 

Regulatory approval was then obtained for multi-centre Phase II 
clinical trials, and the first (EudraCT 2009-011675-79) aimed at 
assessing the short-term cumulative incidence of CMR conversion 
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for patients who never reached CMR under imatinib alone (https:// 
www.clinicaltrialsregister.eu/ctr-search/trial/2009-011675-79/FR#E). 
Scoring by quantitative PCR was performed over the course of the first 
12 months after trial initiation during concurrent and brief exposure 
(3 to 12 months) to the imatinib-pioglitazone combination”. Out of 
24 assessable patients, the cumulative incidence rate in the treated 
group reached 57% versus 27% (P = 0.02) for an historical group of 
patients having received imatinib alone, thus indicating that clinical 
evidence of efficacy can already be detected even after very brief 
treatment and early analysis. Post-trial follow-up confirmed stability 
of CMR status in data collected to date. Therapy with pioglitazone was 
accompanied by a stable reduction of STAT5 mRNA in patient sam- 
ples as early as month 6 (2.3-fold, P = 0.0003) and by a reduction of 
the clonogenic potential of bone marrow CD34* cells (1.54-fold, 
P = 0.0003). 

Although both imatinib and pioglitazone decrease STATS activity, 
they act by different mechanisms. Imatinib inhibits STAT5 activa- 
tion by BCR-ABL phosphorylation, whereas pioglitazone decreases 
STAT5 expression. It seems that imatinib alone is sufficient to induce 
effective clearance of the bulk of more differentiated CML cells, but 
fails to bring STAT5 activity below a threshold for CML LSC to exit 
from quiescence and to undergo subsequent apoptosis. Pioglitazone is 
effective at doing so in synergy with imatinib (Fig. 4a, top insert). As in 
this example with CML, progressive erosion of cancer stem cell pools 
may prove ultimately achievable pharmacologically, bringing hope of 
obtaining cancer eradication in a variety of human malignancies by 
combination therapy. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Reagents. For in vitro assays, PPARy agonists were provided by Cayman 
Chemical (PPARy-PAK; Bertin-pharma). Imatinib mesylate was provided by 
Novartis and was used at 1 1M in culture, a well-established inhibitory concen- 
tration in vitro that also approaches the achievable drug level in patients’ plasma. 
Dasatinib and JQ1 were provided by Bristol-Myers Squibb and Sigma and were 
used in culture at 0.146 1M and 1 uM, respectively. Murine pro-B cell line Ba/F3 
and human chronic myelogenous leukaemia cell line K562 were provided by the 
American Type Culture Collection (ATCC; Ref. CRL-12015 and CCL-243, 
respectively). These cell lines were tested for mycoplasma contamination every 
3 months using Venor Gem Advance Pre-aliquoted Mycoplasma Detection Kit 
(Minerva biolabs). 

Cell culture and proliferation assays. CD34" cells from patients in CP-CML at 
diagnosis or umbilical cord blood were immunoselected (CD34 microBead Kit, 
Miltenyi Biotec) according to the manufacturer’s instructions. Enrichment for 
CD34" cells was ascertained by flow cytometry using an anti-CD34 monoclonal 
antibody (clone 581; BD Pharmingen). Ph1 *_CD34°* cells were cultured in serum 
free medium (SFM) StemSpan (StemCell Technologies) without growth factors. 

Colony forming cell (CFC) and long term culture-initiating cell (LTC-IC) 
assays. For CFC assays, CD34" cells were suspended (1 X 10*) in 3 ml of alpha- 
MEM based methylcellulose medium (GF H4434, StemCell Technologies). Cells 
were scored and collected after 14 days incubation at 37°C and 5% CO. After 
scoring, colonies were washed with PBS and kept frozen in RNAlater (Invitrogen) 
for subsequent analysis. LTC-IC with limiting dilution assays (LDA) were per- 
formed in StemSpan SFM (Stemcell technologies) on irradiated MS5 monolayers 
at several dilutions of CD34* cells (300, 150, 75, or 37 cells per well for Phi* 
CD34" cells and 200, 100, 50, or 25 cells per well for CD34" from healthy donors) 
in 96-well plates with 16 replicate wells per concentration. After five weeks with 
weekly change of one half medium volume, all cells were transferred in alpha- MEM 
based methylcellulose medium (GF H4434, Stemcell technologies) to determine the 
total clonogenic cell content of each LTC. LTC-IC frequencies were determined 
using the L-Calc software (Stemcell technologies). 

Flow cytometry. The following antibodies were used: fluorescein isothiocyanate 
(FITC)-conjugated IgG1 (clone 679.1Mc7, Beckman Coulter), Alexa Fluor 488- 
conjugated-IgG1 (clone MOPC-21, BD Pharmingen), allophycocyanin (APC)- 
IgG1 (clone MOPC-21, BD Pharmingen), peridinin chlorophyll protein-cyanin 
5.5 (PerCP-Cy5.5)-conjugated IgG1 (clone X40, BD Pharmingen), phycoerythrin 
cyanin (PE-Cy7)-conjugated IgG1 (clone MOPC-21, BD Pharmingen), (PerCP- 
Cy5.5)-conjugate CD45 (clone 2D1, BD Pharmingen), (APC)-conjugated CD34 
(clone 581, BD Pharmingen), (PE-Cy7)-conjugated CD38 (clone HB7, BD 
Pharmingen), Alexa Fluor 488-conjugated anti-STAT5 (pY694) (clone 47, BD 
Pharmingen), (PE)-conjugated anti-GLUT1 (FAB1418P, R&D systems). For all 
experiments, cell viability was assessed using SYTOX Blue dead cell stain 
(Invitrogen Life Technologies). 

Intracellular STATS phosphorylation assays. In brief, 3 X 10° K562 cells per ml 
cultured in complete Dulbecco’s modified Eagle medium supplemented with 10% 
fetal calf serum (PAA) alone and with or without pioglitazone (10 1M) or imatinib 
(1 pM) at 37 °C in 5% CO, were harvested at variable time as indicated. Cells were 
fixed and permeabilized using Cytofix/Cytoperm kit (BD Pharmingen) and 
stained with Alexa Fluor 488-anti-phospho-STAT5 monoclonal antibody (BD 
Phosflow) or Alexa Fluor 488-isotype-matched control to obtain fluorescence 
minus comparative in each experiment. Analysis was carried on a minimal num- 
ber of 50,000 events in the viable cell gate. The delta mean fluorescence intensity of 
p-STATS5 after drug treatment (p-STAT5AMFI) was determined as follow: 
(untreated cells p-STATS MFI - non-treated cells isotype-control MFI) — (drug 
treated cells p-STATS MFI — drug treated cells isotype-control MFI). 

CSFE assays. Fresh CD34" -enriched cells were stained with 2 UM of 5- (and 6-) 
carboxyfluorescein diacetate succinimidyl diester (CFSE, Invitrogen). Cells were 
then cultured (seeded 5.10° per ml) in SFM StemSpan (StemCell Technologies) 
without growth factors and with or without pioglitazone (10 1M) or imatinib 
(1 uM). Cells cultured in the presence of Colcemid (100 ngml', Invitrogen 
Life Technologies) were used to establish the range of fluorescence exhibited by 
cells that had not divided during post-labelling incubation. Cells were harvested at 
variable time points as indicated, collected in BD Trucount tubes for absolute 
count (BD Biosciences) and labelled with anti-CD45 and anti-CD34. Then, cells 
were diluted in 1ml of phosphate-buffered saline (PBS, Invitrogen Life 
Technologies) containing 2% fetal calf serum (PAA) and stained for viability. 
All analyses were carried out on a BD FACS Canto2 Flow Cytometer. 

DNA synthesis assay. Cell proliferation rate was measured by incorporation of 
5-ethynyl-2'-deoxyuridine (EdU), a thymidine nucleoside analogue, in DNA dur- 
ing active DNA synthesis (two hours). Staining was performed according to the 
manufacturer’s protocol (Click-iT EdU Flow Cytometry Assay Kit, Invitrogen). 
All analyses were carried out on a BD FACS Canto2 Flow Cytometer. 


RNA extraction and RT-qPCR analysis. RNA was extracted from 2 X 10° cells 
using RNAqueous-4PCR (Ambion). Reverse transcription was carried out for 1 h 
at 42 °C using SuperScript Vilo cDNA Synthesis kit (Invitrogen Life Technologies) 
according to the manufacturer’s instructions. Real-time PCR was performed in an 
iCycler thermocycler (CFX, Bio-Rad). The primers and probes sequences for 
GAPDH, STAT5B, STATS5A, BCR-ABL, ABL, HIF1a, CITED2, OCT1”, MDRI1”, 
SIRTI’’, STAT3**, ALOX5”, GLUT”, B-catenin”’, PML”, HIF2a"’, Bcl-X;, Bcl-2, 
PIM-1, CIS’, BMI1”, HES-1”, p57 (CDKNIC)”*, CD36"" (known to be upregu- 
lated by PPARy agonists, was used as a positive control) are reported in 
Supplementary Table 1. The primer pairs used with TaqMan Gene Expression 
Master mix (Applied Biosystems) and iQ Supermix SYBR GRN (Bio-Rad) are 
listed in Supplementary Table 1a and Supplementary Table 1b, respectively. The 
comparative Cy; method (AAC;) was used to compare gene expression levels 
between the different culture conditions (relative to GAPDH). 

BCR-ABL/ABL quantification. qPCR experiments were performed on cDNA 
using the 7000 Sequence Detection System (Applied Biosystems). The BCR-ABL/ 
ABL ratio was determined using FusionQuant standards (Ipsogen) according to 
the Europe Against Cancer protocol*’. CMR is defined as undetectable minimal 
residual disease (negative BCR-ABL transcripts) while showing a sensitivity level 
of at least 40,000 amplified copies of the ABL control gene, that is to say more than 
4.5 log reduction by standardized International Scale (IS) RT-qPCR (that is, BCR- 
ABL/ABL* mRNA ratio < 0.0025%); relapse from CMR is to be declared when at 
least 2 consecutive positives occur 6 months apart. These criteria are consistent 
with the level of sensitivity routinely applied within laboratories participating in 
the French GBMHM Network (Group of Molecular Biologists for Hematological 
Malignancies). 

Interphase FISH probe assay. Fluorescent in situ hybridization (FISH) was per- 
formed on interphase nuclei, following standard procedures and using specific 
probe for the t(9;22) (MetaSystems, Germany). The probe is designed as a dual- 
fusion assay. The red labelled probe detects an extended region at the ABL1 locus 
on 9q34 and a green labelled probe hybridizes specifically to regions at the BCR 
gene on 22q11. Preparations were counterstained with 4,6-diamidino-phenyl- 
indole (DAPI) and a minimum of 50 interphase nuclei were examined. Results 
were recorded using a fluorescence microscope (Nikon) fitted with appropriate 
filters, and digital-imaging software Lucky (CaryoSystems, France). 

Western blot analysis. For STATS protein analysis, K562 cells (2.5 X 10°) were 
lysed in RIPA lysis buffer on ice. Whole-cell extracts were boiled for 5 min in 
Laemmli sample buffer and subjected to SDS-PAGE in 4-12% acrylamide gels 
(Nupage, Invitrogen Life Technologies). Proteins were transferred to Hybond N+ 
filters (Amersham). Membranes were probed with the following antibodies: 
STATS (sc-1656), ACTIN (sc-8432), PPARy (H-100:sc-7196), goat anti-mouse 
IgG-HRP (sc-2005) (Santa Cruz Biotechnology Inc.) and Anti-HIF2« (SMC- 
185C/D). Antibody binding was detected by the enhanced chemiluminescence 
ECL+ (Amersham). 

Lentiviral vector production and transduction. STAT5B lentiviral vector. The 
cDNA encoding STAT5B was cloned, sequenced (GenBank accession number 
DQ267926), and inserted into the SIN-cPPT-PGK-WHV lentiviral transfer vector 
as previously described**. A SIN-cPPT-PGK-eGFP-WHV lentiviral vector was 
used for control. 

OCT-1 and HIF2qa lentiviral vectors. OCT-1 (SLC22A1, accession number 
BC126364) and HIF2a (EPAS1, accession number BC051338) lentiviral vectors 
were provided by Applied Biological Materials, Inc. (catalogue nos LV309003 and 
LV149063). 

Constitutively active murine Stat5a and Stat5a. Stat5a(1*6)(H299R, S711F) and 
Stat5b(1*6)(H299R, S711F) retroviral vectors were provided by Cell Biolabs, Inc. 
(catalogue nos RTV-333 and RTV-335). 

shRNA lentiviral vector anti-PPAR-y. The PPAR-y mRNA pairing sequence 5’ - 
TGTTCCGTGACAATCTGTC-3’ (GenBank accession number L40904) was 
designed and synthesized as follows within an shRNA structure comprising 
unique restriction sites at each end: sense 5'-GATCTCCTGTTCCGTGACA 
ATCTGTCTTCAAGAGAACAGATTGTCACGGAACATTTTTGGAAGAATT 
CC-3’; antisense 5'-CTGAGGAATTCTTCCAAAAATGTTCCGTGACAAT 
CTGTAAGTTCTCTACAGATTGTCACGGAACAGGA-3’. Oligonucleotides 
were annealed and ligated into BglII and Xhol sites of linearized pSuper plasmid. 
PollII H1 promoter-shRNA PPAR-y was then subcloned in the pTRIP lentiviral 
vector. Vectors were produced as previously described**. 

BCR-ABL lentiviral vector. Total RNA from K562 cells was extracted using 
TRizol (Invitrogen Life Technologies). Reverse transcription was carried out for 
lhat 50°C using SuperScript III (Invitrogen Life Technologies). Two independent 
PCR were performed using BCR-ABL F 1, 5’- ATGGTGGACCCGGTGGGCTT-3’ 
with BCR-ABL R 2831, 5'-CTGCTACCTCTGCACTATGTCACTG-3’ and BCR- 
ABL F 2685, 5'-TCCGCTGACCATCAATAAGGA-3' with BCR-ABL R 6097, 5'- 
CTGCTACCTCTGCACTATGTCACTG-3’ respectively. Specific amplification 
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bands were pooled, heated to 95 °C during 3 min and ramp cooled to 25 °C over a 
period of 45 min. Annealing product was submitted to a third PCR with LA Taq 
DNA polymerase (Takara) using the following primer pair: BCR-ABL F 1 ascl: 5’- 
AGGCGCGCCATGGTGGACCCGGTGGGCTT-3’ and BCR-ABL R 6097 sbfl: 
5'-CCTGCAGGCTGCTACCTCTGCACTATGTCACTG-3’. Amplification pro- 
duct was subcloned into a pCR-XL-TOPO plasmid (Invitrogen Life Technologies) 
before being inserted into the SIV GAE-SSFV lentiviral transfer vector, followed by 
DNA sequencing. An SIV GAE-SSFV-eGFP vector was used as a control. The SIV 
vectors were produced as previously described™. 

CD34* cell transduction. Cells were suspended (1 X 10° ml~') in StemSpan 
(StemCell Technologies, France) supplemented with protamine sulphate 
(4 ug ml‘), SCF (100 ngml'), FLT-3-L (100 ngml'), IL-3 (20 ngml’), and 
IL-6 (20 ng ml‘), in a 96-well plate coated with RetroNectin (Takara Shuzo Co., 
Japan). Cell suspensions were incubated for 16 h. Lentiviral vectors were then 
added and cell suspensions incubated for 12 h. Cells were washed twice before 
being seeded. 

siRNA assays. siRNA targeting the human PPARy sequence 5'-TGTTCCGTG 
ACAATCTGTC-3’ were synthesized (Sigma-Aldrich Proligo). siRNAs targeting 
the human STATS sequence 5’-AAACTCAGGGACCACTTGC-3’, human HIF2x 
sequence 5'’-ATTAGAGCAAAGAGTCAGC-3’ and murine Cited2 sequence 5’- 
CGAGGAAGTGCTTATGTCCTT-3’ were synthetized (eurofins MWG/operon). 
CD34* BM cells were transfected with specific siRNA (25 nM) or control siRNA in 
the presence of Lipofectamine 2000 (Invitrogen) and maintained for 48 h before 
CFC assay. Control siRNA was purchased from Invitrogen Life Technologies 
(BLOCK-iT). Transfection efficiency was assessed using a fluorescein-labelled, 
double-strand RNA duplex (BLOCK-iT FluorescentOligo; Invitrogen). 

Human patients. Fresh bone marrow from patients with chronic-phase CML at 
diagnosis, umbilical cord blood cells from heathy donors, and blood or bone 
marrow samples from diabetes patients and patient given pioglitazone off-label 
were obtained with informed consent approved by the hospital’s Institutional 
Review Board (Comité de protection des personnes Ile-de-France XI) under 
approved protocol EudraCT number: 2009-011675-79. Imatinib levels were mea- 
sured in patients’ plasma using a previously described high-performance liquid 
chromatography (HPLC) method”. 

Statistical analysis. No statistical methods were used to predetermine sample size, 
the experiments were not randomized and the investigators were not blinded to 
allocation during experiments and outcome assessment. For culture assays and 
quantitative real-time PCR, values were calculated as mean + standard deviation 
for at least three separate experiments performed in triplicate. Paired and unpaired 
comparisons were made, using the nonparametric Wilcoxon rank test and the 
Mann-Whitney test, respectively. Limiting dilution analysis was carried out with 
L-Calc software (StemCell Technologies). All statistical analyses were carried out 
with StatView software (SAS Institute Inc., Cary, NC). 

Statistical information on samples described in figures. For all culture assays, 
paired and unpaired comparisons were made using the nonparametric Wilcoxon 
rank test and the Mann-Whitney test, respectively. Fig. 1a, b, plotted are means for 
CD34* cells from 4 CP-CML patients, 16 replica for each. Data with imatinib 
alone are not statistically different from those for the untreated control 
(P = 0.067), whereas pioglitazone as a single agent reduced LTC-IC frequencies 
by 2.4-fold (P = 0.008) or by 3.5-fold in combination with imatinib (P < 0.001). 
LTC-IC frequencies were established using L-Calc software (StemCell Techno- 
logies). Fig. 1c-e, All patients and statistical analysis are presented in Extended 
Data Table 1 (n = 6). Fig. 2a, STAT5B RT-qPCR normalized to GAPDH mRNA. 
Shown are means with standard deviations (s.d.) for 5 CP-CML patients. STAT5B 
mRNA levels decreased by 8.5-fold (P < 0.0001), 1.5-fold (P = 0.08) and 10.5-fold 
(P <0.0001) in the presence of pioglitazone, imatinib and the drug combination, 
respectively. Fig. 2b, Compared to imatinib alone, RT-qPCR analysis shows that 
addition of pioglitazone induces a significant reduction in mRNA levels by 3.3 and 
4.8 fold for BCL-x, and BCL-2, respectively, and 1.6 fold for PIM1 and CIS. mRNA 
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quantification are normalized to GAPDH levels (n=5 CP-CML patients, 
*P < 0.05). Fig. 2c, Means with s.d. for 5 CP-CML patients. The effect of piogli- 
tazone was negated by an siRNA against PPARy (P = 0.043); siRNA validated in 
Extended Data Fig. 1c. Fig. 2d, Means with s.d. for 5 CP-CML patients. STATS 
overexpression counteract pioglitazone effect (P = 0.0047); LvSTAT5 validated in 
Extended Data Fig. 1c. Fig. 3a, results are normalized to GAPDH mRNA levels and 
represented relative to mRNA expression for the “Imatinib alone” condition 
(means of 11 patients with s.d. for each mRNA assessed). Fig. 3b, results are 
normalized to GAPDH mRNA levels and represented relative to mRNA express- 
ion of “untreated cells” (means of 6 patients with s.d.). As compared to untreated 
controls, cells treated with imatinib alone show a 6.2-fold increase of HIF2x 
relative to control (P< 0.011) and a 4.5-fold increase of CITED2 (P = 0.0277). 
The addition of pioglitazone reduced the imatinib-mediated HIF2u increase 
to 2.8-fold (P = 0.027). Pioglitazone significantly reduced HIF2« induction by 
2.2-fold (P = 0.027) and fully counteracted CITED2 induction. Pioglitazone alone 
has no effect compared to control. Fig. 4b, According to IS, relapse was declared 2 
consecutive positives, 6 months apart. For Extended Data Figures, statistical 
information is included in their legends. 

Statistical information regarding synergy determination. The putative syn- 
ergistic effect of multiple drugs was determined by the algorithm and definitions 
of the Chou-Talalay medium-effect method’’. The imatinib concentration 
required for 50% inhibition of the number of colonies obtained after CFC assay, 
ICs" was first determined in CD34" cells from 5 CP-CML patients at dia- 
gnosis. There was marked variability between patients (0.6 UM < ICs 9 < 2 uM with 
a median at 1 1M). Accordingly, at 1 4M imatinib concentration, the percentage 
inhibition was 47.1% + 24 of untreated CFC numbers in our full cohort of 29 CP- 
CML patients (Extended Data Figure 1). Percentages of inhibition were 65.7% + 26 
and 12% + 15 of untreated CFC numbers for pioglitazone alone (10 1M) and 
combination (imatinib 1M, pioglitazone 10 1M), respectively. A combination 
index (CI) <1 defined synergy. We calculated ICsg and CI values with the 
Calcusyn software (Biosoft, Cambridge, UK). Imatinib and pioglitazone were 
assumed as having independent modes of action. In these conditions, CI were 
always less than 0.248, thus indicating synergy between the two drugs. 
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Extended Data Figure 1 | Clonogenicity assays in the presence of various 
PPARy agonists and validation of STAT5B overexpression and anti- 
PPARy, anti-STAT5 and anti-HIF2a siRNA. a, Clonogenic capacities of BM 
CD34" cells were assayed following pre-incubation for 2 days with culture 
medium alone (control) or supplemented with PPARy agonists, PGJ,, 
troglitazone (Tro), ciglitazone (Cig), rosiglitazone (Ros), pioglitazone (Pio) or 
MCC-555 (MCC) (25 UM each) (samples from 4 donors in triplicate). The 
number of colonies scored is expressed as percentage of control (untreated) 
values with standard deviation (s.d.), *P < 0.05 using the nonparametric 
Wilcoxon rank test. b, Validation of anti-PPARy siRNA used in Fig. 2b. CD34> 
cells were transfected with irrelevant or PPARy targeting siRNA (25 nM each). 
An anti-PPARy shRNA was used as a positive control. PPARy transcripts 
were normalized to GAPDH transcripts and expressed relative to the 

levels measured in untransfected cells. c, Western blot analysis with PPARy, 
pan-STAT5, HIF2o and anti-actin antibodies (Ab). Validation of siRNA 
against PPARy or STATS and lentivector expressing STAT5B (LVSTAT5B) 
were realized on CD34" cells from human UCB. Validation of siRNA against 
HIF 2a was realized on K562 cell line. Ctrl, scrambled siRNA; —, untreated. 
Quantification of western blot signal was realized with Image] software (http:// 
rsb.info.nih.gov/ij/). Histograms show mean values with s.d., n = 3. 
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Extended Data Figure 2 | Differential and synergistic effects of pioglitazone 
and TKIs on CML cells. a, CFC assays with CD34* CP-CML cells from 
patients at diagnosis. Imatinib and/or pioglitazone were added for 48h before 
CFC assays. Means of 29 patients with standard deviation (s.d.). b, CFC 
assays after lentivector-mediated expression of BCR-ABL or eGFP (negative 
control) in human cord blood CD34* cells. Imatinib and/or pioglitazone 
were added for 48 h before CFC assays. Means of 3 individuals in triplicate 
with s.d. c, d, Limited dilution analysis (LDA) of CML LSCs by LTC-IC and 
frequency analysis. Plotted are means for CD34" cells from 2 CP-CML 
patients, 16 replica each. Imatinib 1 1M, Rosi 10 1M. e, f, CFSE analysis of 
CD34" cells (>96% Ph*) from CP-CML patients (for all experiments, 
imatinib 1 uM, dasatinib 0.146 LM, pioglitazone and rosiglitazone 10 uM, JQ1 
1 uM. imat, imatinib; dasa, dasatinib; pio, pioglitazone; (P), undivided). To 
confirm the pivotal role played by STATS5 in the mechanism of action of 
pioglitazone in eroding the pool of TKI-resistant CML-LSCs, we investigated 
here the effect of the bromodomain inhibitor JQ1, which inhibits the 
transcriptional function of STATS by decreasing its activity through targeting 
the bromodomain-containing protein 2 (BRD2), a key cofactor of STATS. 
Although this study with JQ] is corroborative, one cannot completely exclude 
the possibility that these effects are coincidental, as targeting BRDs may cause a 
series of effects independent of STATS. 
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Extended Data Figure 3 | Pioglitazone slowly decreases STAT5 expression 
whereas imatinib rapidly inhibits STATS phosphorylation. a, Differential 
kinetics of action of imatinib and pioglitazone. CD34” CP-CML cells 
(patient 4) in liquid culture in serum-free medium without cytokines. b, Rate of 
apoptosis in CP-CML cell populations after 4 days of culture with imatinib 
and/or pioglitazone (n = 5; *P < 0.05). Solid bars (black for CD38” and grey 
for CD38"), percentage of recovery relative to input and normalized to 
untreated controls. Hatched bars, percentage of apoptosis, defined by the 
expression of annexin V. c, Flow cytometry analysis of permeabilized K562 cells 
with IgG against phosphorylated (Tyr694) STATS. Untreated (black) and drug 
treated (red or blue). Control panel, no drug treatment but irrelevant IgG 
isotype control (grey peak). d, Western blot analysis with pan-STATS and 
anti-actin antibodies, showing a decrease of STATS by 3.5 fold + 0.5 (s.d.) in 
lane 4 (n = 3). Lanes 1 and 2 for imatinib (15 and 30 min exposure, 
respectively); lanes 3 and 4 for Pio (72 and 96 h exposure, respectively). 

Ratio indicates ratio of STAT5 expression/f-actin expression relative to lane 1. 
Quantification of western blot signals (n = 3 for each condition) was 

realized with Image] software (http://rsb.info.nih.gov/ij/). 
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Extended Data Figure 4 | Forced expression of STATS in CP-CML CD34* 
cells increases the compartment of quiescent cells. a, CFSE analysis of 
CP-CML CD34" cells treated with pioglitazone after transduction with 
lentivectors (Lv) expressing eGFP or STAT5B, whose transcription is PPARy- 
independent. Representative CP-CML patient 2 in triplicate (data for all 
patients are in Extended Data Fig. 2e). One coloured peak for each cell division 
number. P, colcemid arrested “parent-cells”. b, Distribution (%) of CD34" cells 
in each division peak shown in Extended Data Fig. 3a. c, STATS mRNA 
expression analysis. d, Transduction efficiency of STATS lentivector. (5 replica 
with s.d.). e, Data for the 3 patients tested. 
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Extended Data Figure 5 | High toxicity of pioglitazone for CML LSCs vs. 
low toxicity for normal HSCs. LTC-IC (LDA) showing differential 
toxicity of pioglitazone for CP-CML vs. normal CD34" cells (n = 3, 16 
replica for each). 
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Extended Data Figure 6 | Erosion of undivided and imatinib-resistant 
CD34* CP-CML cells is OCT1-independent. a, Efficiency of LYOCT1 
transduction (D8). b, OCT1 mRNA expression. Results are normalized to 
GAPDH mRNA levels and represented relative to mRNA expression for the 
“imatinib alone” condition. c, CFSE analysis and absolute cell count in the 
presence of imatinib, with or without OCT1 overexpression. Left scale (black), 
total cells showing CD34* vs. CD34 cells (histograms). Right scale (red), 
undivided CD34" cells (red dots) (representative for n = 3 CP-CML patients). 
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Extended Data Figure 7 | The viability of undivided (P) and imatinib- 
resistant CD34* CP-CML cells depend on HIF2a expression. Representative 
CP-CML patient 8 in triplicate (data for all patients are in Extended Data 
Fig. 6e). a, CFSE analysis in presence of siRNA against STATS or HIF2« in 
CD34*-Ph* cells treated or not with imatinib. One colored peak for each cell 
division number. P, colchemid arrested ‘parent-cells’. b, Distribution (%) of 
CD34* cells in each division peak. c, HIF2x mRNA expression 72 h after 
siHIF2« transfection. d, STAT5 A and B mRNA expression 72 h after siSTAT5 
transfection into human UCB CD34* cells. Results are normalized to 
GAPDH mRNA levels (means of 5 experiments with s.d. for each gene 
assessed). e, Data for the 5 patients tested (*P < 0.05 relative to siCtrl; #P < 0.05 
relative to Imatinib + siCtrl). f, CFSE analysis of cord blood CD34" cells 
after transduction with lentivectors (Lv) expressing HIF2o or eGFP. 

One coloured peak for each cell division number. P, colchemid arrested ‘parent- 
cells’. g, Distribution (%) of CD34" cells in each division peak (m= 5). 

h, Transduction efficiency of HIF2« Lv. i, HIF2x mRNA expression (means 
of 5 experiments with s.d.). 
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Extended Data Figure 8 | Expression of target genes in CD34* cells and Ba/ 
F3 cell line CML-models. a, mRNA expression of target genes in CD34* 
cells from UCB transduced or not by BCR-ABL expressing lentivector (Lv). 
BCR-ABL* cells were cultured in serum-free medium without cytokines for 7 
days with either imatinib alone (1 |1M) or imatinib and pioglitazone (1 uM and 
10 1M, respectively) (means of 5 experiments with s.d. for each gene assessed). 
Results are normalized to GAPDH mRNA levels and represented relative to 
mRNA expression for the ‘untreated’ condition. Overexpression of BCR-ABL 
in CD34" cells from umbilical cord blood induced expression of STATS and 
HIF1a% mRNAs by 2.7- and 2.8-fold, respectively (P = 0.043), while HIF2a 
and CITED2 mRNAs were increased by 12.5-fold and 9-fold (P = 0.043), 
respectively. In the presence of imatinib, STATS and HIFla mRNAs 

were increased by 7.5- and 4.9-fold (P = 0.043), respectively, while HIF2« and 
CITED2 were increased by 19.3- and 22-fold (P = 0.043), respectively. Either 
pioglitazone or an siRNA against STAT5 (A and B) significantly reduced 

the levels of HIF2~ and CITED2 mRNAs, while an siRNA against HIF2« 
significantly reduced CITED2 mRNA expression (>threefold each, P < 0.05). 
b, mRNA expression of target genes in Ba/F3 cell sub-lines independent of IL3 
for viability after transduction with LVBCR-ABL or constitutively activated 
Stat5A1*6 (A*) or Stat5B1*6 (B*). Results are normalized to GAPDH mRNA 
levels and represented relative to mRNA expression for the original Ba/F3 
cell line (means of 5 experiments with s.d. for each gene assessed). Forced 
expression of BCR-ABL increased the level of murine endogenous Stat5 (a and 
b) mRNAs by 2.7 fold (P = 0.043). When BCR-ABL or constitutively activated 
murine Stat5 1*6 (a or b) were overexpressed, murine endogenous Hifla 
mRNA level was decreased by threefold (P = 0.043) and murine endogenous 
Hif-2x and Cited2 mRNAs increased by more than eightfold each (P = 0.043). 
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Extended Data Figure 9 | The key regulator of HSC quiescence, CITED2, is 
overexpressed in TKI-resistant CD34* cells from CP-CML patients. 

a, mRNA expression of CITED2 and target genes thereof BMI1, HES1 and p57 
after 9 days of culture with or without imatinib and pioglitazone. Results are 
normalized to GAPDH (n = 4). b, mRNA expression of endogenous 

murine Cited2 and its target genes Bmil, Hes1 and p57 in Ba/F3 cell line with 
or without forced expression of BCR-ABL or constitutively active Stat5A 1*6 
(A*) in the presence or not of siRNA against Cited2. Results are normalized to 
GAPDH (mean = s.d. of 3 independent experiments in triplicate). Forced 
expression of a constitutively active form of murine Stat5 1*6 (A or B) in Ba/F3 
cells, in and of itself, was sufficient to increase endogenous expression of murine 
Cited2 markedly (52-fold) as well as that of its target genes Bmil1 (2.5-fold), 
Hes] (13-fold) and p57 (18-fold) c, Proliferation analysis by EdU incorporation 
assay of the Ba/F3 cell line that expresses or not constitutively active forms 
of Stat5 A 1*6 (A*) or B 1*6 (B*) in the presence or not of siRNA against Cited2 
(representative result of 5 independent experiments). 
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Extended Data Table 1 | CFSE analysis of CD34* cells (>96% Ph*) from 6 CP-CML patients after liquid culture without cytokines 


Untreated D10 Imatinib D10 Pio D10 Imatinib + Pio D10 

. CD34 Undivided CD34 Undivided CD34 Undivided CD34 Undivided 
Patient 3 3 3. 

(x 10°) (P) (x10°) () (x10°) () (x10°) (P) 
2 13 310 40 410 6.0 60 08 45 
3 3.6 44 0.6 60 28 22 05 15 
4 15 129 3.6 150 6.0 69 23 27 
5 3.0 99 1.0 132 0.7 60 05 81 
6 15 128 24 63 9.0 35 13 48 
4 7.8 220 1.3 251 18 153 08 107 

mean 155 177.6 66.5 53.8 

a 
P=0.248 P=0.027 Se 
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The spliceosome is a therapeutic vulnerability 


in MYC-driven cancer 


Tiffany Y.-T. Hsu’**4, Lukas M. Simon‘, Nicholas J. Neill'*, Richard Marcotte®, Azin Sayad°, Christopher S. Bland'*, 
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MYC (also known as c-MYC) overexpression or hyperactivation is 
one of the most common drivers of human cancer. Despite intensive 
study, the MYC oncogene remains recalcitrant to therapeutic 
inhibition. MYC is a transcription factor, and many of its pro- 
tumorigenic functions have been attributed to its ability to regulate 
gene expression programs’ *. Notably, oncogenic MYC activation 
has also been shown to increase total RNA and protein production in 
many tissue and disease contexts*’”. While such increases in RNA 
and protein production may endow cancer cells with pro-tumour 
hallmarks, this increase in synthesis may also generate new or heigh- 
tened burden on MYC-driven cancer cells to process these macro- 
molecules properly*. Here we discover that the spliceosome is a new 
target of oncogenic stress in MYC-driven cancers. We identify 
BUD31 asa MYC-synthetic lethal gene in human mammary epithe- 
lial cells, and demonstrate that BUD31 is a component of the core 
spliceosome required for its assembly and catalytic activity. Core 
spliceosomal factors (such as SF3B1 and U2AF1) associated with 
BUD31 are also required to tolerate oncogenic MYC. Notably, 
MYC hyperactivation induces an increase in total precursor messen- 
ger RNA synthesis, suggesting an increased burden on the core 
spliceosome to process pre-mRNA. In contrast to normal cells, par- 
tial inhibition of the spliceosome in MYC-hyperactivated cells leads 
to global intron retention, widespread defects in pre-mRNA mat- 
uration, and deregulation of many essential cell processes. Notably, 
genetic or pharmacological inhibition of the spliceosome in vivo 
impairs survival, tumorigenicity and metastatic proclivity of 
MYC-dependent breast cancers. Collectively, these data suggest that 
oncogenic MYC confers a collateral stress on splicing, and that 
components of the spliceosome may be therapeutic entry points 
for aggressive MYC-driven cancers. 

To discover genes and cellular processes required to tolerate onco- 
genic MYC expression, we previously performed a genome-wide 
MYC-synthetic lethal screen in human mammary epithelial cells 
(HMECs) engineered with an inducible MYC and oestrogen receptor 
fusion protein (MYC-ER) for candidates affecting cell viability in a 
MYC-selective manner’. This screen nominated BUD31 as a candidate 
MYC-synthetic lethal gene (Fig. 1a), in which barcoded BUD31 short 
hairpin RNAs (shRNAs) consistently dropped out of the population in 
MyYC-hyperactivated cells relative to cells without MYC induction 
(Fig. 1b). In validation experiments, BUD31 depletion restrained clo- 
nogenic growth and activated apoptosis in MYC-induced cells, as com- 
pared to MYC-normal cells (Extended Data Fig. la—c). Expression of 
shRNA-resistant BUD31 rescued the MYC-synthetic lethal phenotype 


of BUD31 shRNA (Fig. 1c and Extended Data Fig. 1d), indicating that 
the phenotype is an RNA interference (RNAi) on-target effect. 

BUD31 has been linked to the spliceosome in yeast”®, but its func- 
tion in mammalian systems has not been determined. To uncover the 
molecular function(s) of BUD31, we identified BUD31-interacting 
proteins by Flag-tagged BUD31 immunoprecipitation from cells with 
or without RNase A (which eliminates protein-protein interactions 
mediated by RNA tethering), followed by mass spectrometry. 
Remarkably, 79 out of 134 core spliceosomal components were assoc- 
iated with BUD31 (Extended Data Fig. 2a), suggesting a strong asso- 
ciation between BUD31 and the spliceosome in human cells. 

The spliceosome is a dynamic molecular machine consisting of sev- 
eral nuclear protein complexes that cycle on and off of pre-mRNA 
during intronic splicing’. Co-immunoprecipitation experiments 
confirmed that BUD31 associates with several subcomplexes of the 
spliceosome, including the Prp19-CDC5L subcomplex (PRPF19), the 
U2 small nuclear ribonucleoprotein particles (snRNPs; SF3B1 and 
SF3A1), U2-related factors (U2AF1), the U5 snRNP (EFTUD2), and 
Sm proteins (SNRPF) (Fig. 1d and Extended Data Fig. 2c), but inter- 
action with non-spliceosomal proteins was not detected (Extended Data 
Fig. 2d, e). To test more broadly the association of BUD31 with sub- 
complexes of the spliceosome, we performed bimolecular fluorescence 
complementation (BiFC) between BUD31 and proteins from each 
major spliceosomal subcomplex. BiFC analysis indicated that BUD31 
associates with components of the major snRNPs (U1, U2, U4/U6 and 
U5) as well as Sm proteins (Fig. le and Extended Data Fig. 2b), indi- 
cating that BUD31 is present at several stages of spliceosomal assembly. 

To examine more directly whether BUD31 has a role in pree mRNA 
splicing, we tested in vitro splicing efficiency using nuclear extracts 
with or without BUD31 knockdown. BUD31 loss significantly inhib- 
ited pre-mRNA splicing (Extended Data Fig. 2f-i). In addition, knock- 
down of BUD31 led to defects in early spliceosome assembly, as 
indicated by impaired formation of complex A (Extended Data Fig. 
2h, i). Collectively, these data indicate that HMECs require a core 
spliceosomal protein (BUD31) to tolerate dysregulated MYC. 

We proposed that cells with oncogenic MYC required BUD31 for 
cell survival because of its role in the spliceosome. To test this hypo- 
thesis, we generated a BUD31 mutant deficient in binding core spli- 
ceosomal proteins by mutating a highly conserved region spanning a 
C,-C, zinc-finger. Mutation of this region abrogated BUD31 inter- 
action with spliceosomal proteins (Extended Data Fig. 2j). To deter- 
mine whether this region is also necessary for cells to tolerate MYC 
hyperactivation, we performed an in vitro competition assay. Green 
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Figure 1 | The spliceosome is required for cells to tolerate oncogenic MYC 
hyperactivation. a, BUD31 is a MYC-synthetic lethal gene. b, BUD31 shRNA 
(shBUD31) barcode abundances with/without MYC-ER hyperactivation 
(mean + s.e.m., n = 3 biological replicates). c, Relative number of MYC-ER 
HMECs with dox-inducible shRNA targeting the 3’ untranslated region (UTR) 
of BUD31, and constitutive shRNA-resistant Flag~GFP or Flag~-BUD31 
expression (mean + s.e.m., n = 4 technical replicates). d, Flag-BUD31 
co-immunoprecipitation for core spliceosomal factors. e, Interaction between 
BUD31 and spliceosomal proteins assessed by BiFC (mean + s.e.m., n = 3 


fluorescent protein (GFP)-expressing MYC-driven breast cancer 
cells encoding inducible BUD31 shRNA were transduced with 
shRNA-resistant wild-type or mutant BUD31 complementary DNA, 
and these cells were mixed with non-transduced, GFP-negative cells. 
BUD31 knockdown significantly inhibited the proliferation of MYC- 
driven cancer cells. Proliferation was fully rescued by wild-type BUD31 
cDNA but not by a BUD31 mutant deficient in spliceosomal binding 
(Fig. 1f), suggesting that BUD31 association with the spliceosome 
is required to support the survival of MYC-hyperactivated cells. 
More broadly, these results indicate that oncogenic MYC may increase 
cellular dependency on spliceosome function. By contrast, ectopic 
expression of the oncogenes HER2 (also known as ERBB2) and 
EGFR did not enhance the effects of BUD31 depletion (Extended 
Data Fig. 3a, b), suggesting that the stress imposed by MYC on spli- 
ceosomal function is not a universal feature of the oncogenic state. 

To test whether one or more subcomplexes of the spliceosome are 
required to tolerate aberrant MYC activity, we examined additional 
components of spliceosome assembly and catalysis including SF3B1 
(U2 snRNP), U2AF1 (U2-related splicing factor), EFTUD2 (U5 
snRNP) and SNRPF (core Sm protein found in every snRNP complex). 
Notably, partial depletion of each spliceosomal component led to loss of 
cell viability (Fig. Ih-k and Extended Data Fig. 4a-d) and increased 
apoptosis (Extended Data Fig. 4e-h) in MYC-hyperactivated cells. This 
suggests that several subcomplexes of the core spliceosome are required 
for cells to tolerate oncogenic MYC, and that MYC-hyperactivated cells 
are sensitive to modest perturbations in spliceosome function. 

Next, we investigated whether pharmacological inhibition of the 
spliceosome is also synthetic lethal with MYC. Several pharmacological 
agents (for example, FR901464, pladienolides and their derivatives) 
have been characterized to bind the core SF3b spliceosomal complex 
components and inhibit spliceosome function'”. However, most of these 


technical replicates). f, GEP* MYC-dependent cells with inducible 
shBUD31-UTR and constitutive wild-type, mutant BUD31, or negative control 
cDNA expression were mixed with GFP cells and passaged (mean + s.e.m., 
n = 8 technical replicates, two-tailed Student’s t-test). g, Change in MYC-ER 
HMEC clonogenicity after SD6 treatment (mean = s.e.m., n = 4 technical 
replicates, two-tailed Student’s t-test). h-k, Relative number of MYC-ER 
HMECs after partial depletion of core spliceosomal proteins (mean + s.e.m., 
n= 4 technical replicates, one-way analysis of variance (ANOVA)). 

**P < 0.01, ***P < 0.001. 


inhibitors are not amenable for in vivo delivery. We developed a new 
small molecule inhibitor of SF3B1, known as SD6, that impairs spliceo- 
some function and is bioavailable in mammals’’. Consistent with our 
genetic data, low SD6 concentrations significantly suppressed colony 
formation (Fig. 1g) and induced apoptosis (Extended Data Fig. 4i) in a 
MYC-selective manner. The synthetic-lethal interaction between MYC 
hyperactivation and core spliceosome perturbation suggests that pre- 
mRNA splicing is necessary to tolerate oncogenic MYC. 

In many different cell lineages and experimental systems, oncogenic 
MYC activation has been shown to amplify the synthesis of cellular 
mRNA through direct or indirect mechanisms**’*”*. In agreement, 
MYC hyperactivation in HMECs increased total cellular mRNA syn- 
thesis and mRNA steady-state levels (Fig. 2a) without an increase in 
cellular growth rate (Extended Data Fig. 3c). In contrast to a recent report 
in B-cell compartments’*, MYC hyperactivation did not affect the levels 
of spliceosome proteins in HMECs (data not shown), suggesting that 
increased pre-mRNA dosage is not compensated for by higher spliceo- 
some levels. Thus, we proposed that the MYC-induced increase in global 
mRNA synthesis confers increased pressure on the spliceosome to pro- 
cess pre-mRNAs, and partial perturbation of the spliceosome would lead 
to widespread defects in the splicing of pre-mRNA introns in the MYC- 
hyperactive state. To test this hypothesis, we compared intron retention 
(IR) after BUD31 knockdown in MYC-normal or MYC-hyperactivated 
cells. We performed RNA-sequencing (RNA-seq) from cells in each state 
(normal, BUD31 knockdown, MYC-hyperactive, and MYC-hyperactive 
with BUD31 knockdown) and determined the pre-mRNA splicing effi- 
ciency by calculating IR at junctions across the genome (Fig. 2b). Because 
the analysis of intronic reads may be influenced by the presence of stable 
RNAs within introns and/or spliced lariats, we restricted the analysis to 
reads directly spanning exon-intron or exon-exon junction sequences 
(75,623 junctions in 6,861 genes) (see Methods). 
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a_ c dto0 Figure 2 | In MYC-hyperactivated cells, 
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To examine the effects of spliceosome perturbation in the normal 
and oncogenic MYC states, we compared the effect of BUD31 knock- 
down on junction IR coefficients in wild-type and MYC-hyperacti- 
vated cells. Notably, BUD31 depletion caused significantly more IR in 
the MYC-hyperactive state than in the MYC-normal state (Fig. 2c, 
P<10 *”). Similar results were observed when junction coefficients 
were computed on a gene level (Fig. 2d, P< 107 '®”). The increase in 
IR conferred by aberrant MYC activation and BUD31 shRNA was 
validated on individual exon-intron junctions via quantitative reverse 
transcriptase PCR (qRT-PCR) (examples in Fig. 2e-j). IR was not 
limited to a few discrete genes. Instead, BUD31 knockdown in the 
MyYC-hyperactive state led to significantly increased IR in 42% of 
genes analysed (2,848 of 6,861, P< 0.05). These data indicate that 
the combination of oncogenic MYC activation and partial spliceo- 
some inhibition leads to a widespread increase in IR. This is consist- 
ent with the hypothesis that the MYC-induced increase in pre-mRNA 
synthesis enhances cellular dependency on optimal spliceosome func- 
tion by raising the level of pre-mRNA substrates for spliceosomal 
processing. 


shBUD31 MYC+ 
shBUD31 


‘shBUD31 MYC+ 
shBUD31 


Intron-retaining pre-mRNAs often fail to complete mRNA matura- 
tion and are commonly degraded via quality control mechanisms’’. 
Because the combination of MYC hyperactivation and spliceosome 
inhibition led to a global increase in intron retention (Fig. 2c, d), we 
proposed that these cells may contain widespread defects in pre-e mRNA 
maturation and stability (Fig. 3a). To test this hypothesis, we measured 
the levels of cellular poly(A)* RNA in each of the four states (with/ 
without MYC hyperactivation, with/without BUD31 shRNA) before 
and after treatment with the transcriptional inhibitor actinomycin D. 
After actinomycin D treatment, cellular poly(A)” RNA decreased by 
comparable levels (~ 16-19%) in control cells with or without BUD31 
knockdown (Fig. 3b). Notably, MYC-hyperactivated cells exhibited 
enhanced mRNA stability, perhaps resulting from increased polysomal 
loading of mRNA during MYC-induced translation’*. By contrast, cells 
containing MYC hyperactivation and BUD31-depletion exhibited a 
substantially greater loss (38%) of poly(A)* RNA after actinomycin 
D treatment, suggesting a defect in pre-mRNA maturation and/or 
stability in the combined MYC-hyperactivated and BUD31-shRNA 
state. Similarly, fluorescence in situ hybridization (FISH) measurements 
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Figure 3 | Combined spliceosomal perturbation and MYC hyperactivation 
inhibits pre-mRNA maturation. a, Model of MYC-spliceosome synthetic 
lethality. b, Difference in cellular poly(A)* RNA in HMECs after actinomycin 
D (AD) treatment (n = 3 biological replicates, two-tailed Student’s t-test). 

c, Steady-state poly(A)* RNA levels per cell (10-4 ng) (n = 4 biological 
replicates, two-tailed Student’s t-test). d, Gene Ontology (GO) enrichment of 
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intron-retained genes in the MYC-hyperactive and BUD31-depleted state. 
Dashed line indicates P = 0.05. e, f, In MYC-hyperactive BUD31 shRNA cells, 
representative genes display increased IR (e) and decreased steady-state 
RNA levels (f) after BUD31 knockdown in MYC-hyperactivated cells. Bar 
colours represent GO terms, see legend in Extended Data Fig. 6. Data are 
mean + s.em. **P< 0.01, ***P < 0.001. NS, not significant. 
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of poly(A) ~ RNA revealed that the combination of MYC hyperactiva- 
tion and BUD31 knockdown led to a substantially greater decrease 
(60%) in poly(A)* RNA after actinomycin D treatment (Extended 
Data Fig. 5a). Similar trends were observed in nuclear RNA pools, 
consistent with defects in nuclear pre-mRNA maturation (Extended 
Data Fig. 5b). Consistent with this decrease in pre-mRNA maturation 
and stability, cells containing oncogenic MYC and BUD31 knockdown 
exhibited significantly lower (54%) steady-state levels of poly(A)" RNA 
(Fig. 3c). Collectively, these results indicate that MYC hyperactivation 
increases cellular pre-mRNA synthesis, and inhibition of the spliceo- 
some reduces the cellular capacity to process this pre-mRNA burden. 
The result of this MYC-hyperactivated and spliceosome-hypomorphic 
state is enhanced intron retention, decreased mRNA maturation and 
stability, and a significant loss of steady-state cellular mRNA. 

Gene Ontology analysis of genes with the most significant intron 
retention in the combined MYC-hyperactive and BUD31-depleted 
state (2,848 out of 6,816 genes analysed for IR) suggests that many 
essential processes and subcellular structures were affected, including 
gene expression, DNA replication and repair, the mitotic spindle, 
unfolded protein response, and RNA splicing (Fig. 3d). Many genes 
participating in these essential cell processes exhibited increased IR in 
the combined MYC-hyperactive and BUD31-knockdown state (rep- 
resentative genes in Fig. 3e) and a concomitant decrease in RNA levels, 
consistent with a defect in maturation and stability of IR-containing 
transcripts (Fig. 3f). Consistent with their role in crucial cellular pro- 
cesses, knockdown of these genes reduced cell number by 0.7—-4.2-fold 
(as quantified by barcode-tag abundance, Extended Data Fig. 6). 
Together, these data are consistent with the hypothesis that the com- 
bination of oncogenic MYC and spliceosome inhibition leads to wide- 
spread loss of mRNA integrity, resulting in the deregulation of many 
essential genes and processes instead of a single pathway. 

Because oncogenic MYC significantly increases the sensitivity of 
HMECs to inhibition of the spliceosome, we proposed that MYC-driven 
cancers may be hyperdependent on core spliceosomal function to sup- 
port their survival. We queried whether MYC-driven breast cancer 
cell lines exhibit increased sensitivity to knockdown of core spliceoso- 
mal genes. Recently, we conducted genome-wide RNAi screens in a 
panel of 72 breast cancer and immortalized cell lines for genes affecting 
cell viability (Fig. 4a) (R.M., A.S. and B.G.N., manuscript in prepara- 
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tion). From this data set, we tested for a correlation between 
MYC-dependency (as indicated by sensitivity to MYC shRNAs) and 
dependency on the spliceosome (as indicated by sensitivity to shRNAs 
targeting spliceosome components in the shRNA library), or on 100,000 
randomly drawn gene sets. Notably, MYC-dependent breast cancer cell 
lines were significantly more sensitive to shRNAs targeting the core 
spliceosome (Fig. 4b, P= 0.005). The correlation between MYC- 
dependency and spliceosome-dependency was significantly pronounced 
in the basal breast cancer lines (Fig. 4c, P< 0.00001), an aggressive 
molecular subtype of breast cancer frequently driven by MYC. 

Triple-negative breast cancers are commonly driven by MYC, and 
exhibit an aggressive, highly metastatic clinical course. To determine 
whether MYC-driven triple-negative breast cancers are dependent on 
spliceosomal integrity for their tumorigenic and metastatic proclivity, 
we tested the effects of genetic and pharmacological inhibition of the 
spliceosome on MYC-dependent and metastatic triple-negative breast 
cancer (TNBC) models. Inducible BUD31 shRNA reduced cell viability 
and increased apoptosis in MYC-dependent TNBC cells in vitro 
(Fig. 4d, e and Extended Data Fig. 7a, b). Similar to MYC-ER HMECs, 
MYC protein levels remained unchanged during BUD31 depletion in 
these MYC-dependent cancer cell lines (Extended Data Fig. 8a, b), sug- 
gesting that the apoptotic response was not due to loss of the driver 
oncogene (MYC). To assess the effect of spliceosomal perturbation on 
tumour growth, we established a pooled competition assay that uses 
shRNA-associated barcodes to detect changes in tumour cell fitness 
(Extended Data Fig. 9). In the metastatic TNBC cell line MDA-MB- 
231-LM2 (LM2)"”, inducible MYC-shRNA-expressing cells dropped 
out of the tumour population, confirming the MYC dependency of this 
TNBC model (Fig. 4f). Similarly, tumour cells containing BUD31 or 
SF3B1 shRNA dropped out of the tumour population (Fig. 4f). 
Tumorigenicity of another MYC-dependent TNBC model (SUM159) 
was similarly impaired by BUD31 depletion (Extended Data Fig. 7c, d). 
These data suggest that the loss of BUD31 or other core spliceosomal 
factors inhibits MYC-dependent breast cancer growth in vivo. 

Because MYC-driven breast cancers are prone to metastasize to 
visceral organs including the lungs”, we tested whether perturbation 
of spliceosome function affected metastatic expansion of MYC- 
dependent LM2 cells. As shown in Fig. 4g, metastatic cells with 
BUD31 knockdown were significantly depleted from the population 
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Figure 4 | In vivo perturbation of spliceosomal activity impairs MYC- 
dependent breast tumours and metastases. a, Schematic for identifying 
genetic co-dependencies in breast cancer lines. b, c, MYC-siMEM (mixed-effect 
model) score, which represents the correlation between cell line sensitivity 

to MYC shRNAs and sensitivity to shRNAs targeting random gene sets 

(n = 100,000; see Methods), is plotted against frequency of gene sets. Increasing 
MYC-siMEM values denote higher correlation with MYC-dependency. Red 
arrows indicate MYC-siMEM scores for spliceosome-dependency in all 
breast cancer lines (n = 72) (b) and the basal breast cancer subset (n = 32) (c). 
P value by bootstrap analysis for both. d, e, MDA-MB-231-LM2 cells with 
shBUD31 display diminished BUD31 protein levels (d, bottom), decreased cell 


numbers (d, top) (mean = s.e.m., 1 = 8 technical replicates, two-tailed 
Student’s f-test), and increased caspase-3 cleavage (e, bottom) and caspase-3/7 
luminescence (e, top) (mean = s.e.m., n = 3 technical replicates, two-tailed 
Student’s t-test). f, g, Barcode-shRNA abundance of LM2 cells within primary 
tumours (f) or pulmonary metastases (g). Mean barcode abundance in each 
tumour or lung is normalized to the injected cell population (n = 3 technical 
replicates, two-tailed Student’s t-test). h, Change in LM2 tumour growth 
after 2 weeks of vehicle (n = 13) or SD6 (n = 10) infusion. Bars indicate mean 
values (two-tailed Student’s f-test). i, Pulmonary LM2 bioluminescence after 
10-day infusion with vehicle (n = 7) or SD6 (n = 6). Bars indicate median 
values (Mann-Whitney test). *P < 0.05, **P < 0.01, ***P < 0.001. 
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(>133.5-fold change), with most doxycycline (dox)-positive tumours 
containing BUD31 shRNA barcodes below the level of detection. These 
data suggest that BUD31 and the spliceosome are essential for MYC- 
dependent breast tumorigenicity and metastatic expansion in vivo. 

Next, we tested whether pharmacological inhibition of the spliceo- 
some also impaired tumorigenic and metastatic potential of MYC- 
dependent TNBC cells. Compared to MYC-normal cell lines (half- 
maximal inhibitory concentration (ICs9) value ~ 53 nM), MYC-driven 
cancer cells were significantly more sensitive (ICs9 value ~ 4 nM) to the 
spliceosome inhibitor SD6 in vitro (Extended Data Fig. 10a). Similarly, 
SD6 suppressed the proliferation of a MYC-driven B-cell model* 
(Extended Data Fig. 10b), suggesting that oncogenic MYC may confer 
hyperdependency on the spliceosome in many epigenetic backgrounds 
and cancer types. In primary LM2 tumour xenografts, SD6 potently 
restrained tumour growth with no toxicities in any organ system exam- 
ined, suggesting that splicing is essential for the tumorigenicity of these 
MYC-dependent breast cancer cells (Fig. 4h). Similarly, SD6 impaired 
lung metastatic expansion in experimental metastasis assays (Fig. 4i), 
and extended progression-free survival (Extended Data Fig. 10c). Col- 
lectively, these data suggest that MYC-driven breast cancers depend on 
spliceosomal integrity for their tumorigenic and metastatic progression. 

Altogether, the results suggest that MYC-driven breast cancers contain 
an enhanced dependency on the core spliceosome. Recent studies have 
shown that MYC regulates splicing of select genes via induction of 
alternative splicing factors or components of the core spliceosome’®™’. 
This study suggests that MYC may induce a much broader stress on 
splicing via its ability to increase global pre-mRNA synthesis. Recently, 
there has been considerable investigation into how MYC elicits a wide- 
spread increase in mRNA synthesis across the transcriptome**”?”. 
Notably, either direct or indirect mechanisms of increased pre-mRNA 
synthesis elicited by MYC could lead to an enhanced dependency on the 
spliceosome, and thus make MYC-driven cancers candidates for spliceo- 
some-based therapies. These observations provoke the important ques- 
tion of whether MYC-induced amplification of mRNA synthesis may 
also generate vulnerabilities in other aspects of RNA processing (such as 
mRNA capping, polyadenylation or mRNA export) and downstream 
protein biosynthesis* in MYC-driven cancers. Notably, the spliceosome 
may be a target of both oncogene addiction and oncogenic stress. 
Components of the U2 snRNP, such as SF3B1 and U2AFI, contain 
frequent and recurrent somatic mutations that cluster in an evolutiona- 
rily conserved domain, suggestive of oncogenic function’*”*. On the basis 
of such putative oncogenic functions, the spliceosome has been proposed 
as a target for classical oncogene addiction, in which spliceosome mutant 
tumours may be addicted to the oncogenic functions of spliceosome 
mutants and thus sensitive to spliceosome inhibitors. However, this study 
and others”’”* have shown that inhibition of spliceosome components is 
deleterious in cancer cell line models that lack spliceosome mutations, 
suggesting that other drivers of cancer (such as MYC) are determinants of 
sensitivity to spliceosome inhibitors. Because oncogenic MYC is known 
to drive several pro-tumorigenic programs that include rewiring of bio- 
synthetic pathways”, our model provokes the important hypothesis 
that cellular processes (such as splicing) that enable cancer cells to tolerate 
such widespread shifts in macromolecular synthesis may provide entry 
points for anti-cancer therapies. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Vectors and virus production. Commercially available pGIPZ shRNAs targeting 
BUD31 (V2LHS_47771 and V2LHS_47770), EFTUD2 (V2LHS_28167), SF3B1 
(V3LHS_ 397872), SNRPF (V2LHS_276933) and U2AF1 (V2LHS_84677) were 
obtained from Open Biosystems. shRNAs targeting the 3’ UTR region of 
BUD31 were designed using the BiopredSI and RNAi Codex algorithms 
(shRNA sequence 5’-TGCTGTTGACAGTGAGCGCCGCTGTCTATCAGCTG 
TGATTTAGTGAAGCCACAGATGTAAATCACAGCTGATAGACAGCGATG 
CCTACTGCCTCGGA-3’). For inducible RNAi experiments, shRNAs were sub- 
cloned into the pINDUCER dox-inducible lentiviral expression system*. 
Lentiviruses and retroviruses were produced by transiently transfecting shRNA 
or cDNA constructs using Mirus Bio TransIT transfection protocols into 293T 
cells and collecting viral supernatants 48 h after transfection. 

Cell culture. HMECs expressing hTERT and inducible MYC-ER (MYC-ER 
HMECs), F7 epithelial cells and human mammary epithelial HME] cells were 
cultured in mammary epithelial growth medium (MEGM, Lonza). 293T cells, 
HeLa cells and MDA-MB-231-LM2 human breast cancer cells were cultured in 
DMEM (Gibco) supplemented with 10% FBS. SUM159 human breast cancer cells 
were cultured in F12 (Gibco) media supplemented with 5% FBS, 10 mM HEPES 
(Gibco), 5 ug ml ~ insulin (Invitrogen), and 1 pg ml‘ hydrocortisone. The P493- 
6 human B-cell lymphoma cell line was cultured in RPMI-1640 supplemented 
with 10% FBS (Clonotech) and 1% GlutaMAX (Invitrogen). All cell lines were 
incubated at 37 °C and 5% CO,. Cell lines were obtained from ATCC, and all cell 
lines are tested yearly for mycoplasma contamination. Stable cell lines expressing 
shRNAs or cDNAs were generated by lentiviral or retroviral transduction in the 
presence of 8 yg ml‘ polybrene followed by selection with appropriate antibiotic 
resistance markers. 

Cell proliferation assays. MYC-ER HMECs were infected with pINDUCER- 
shRNA viruses at a multiplicity of infection (MOI) of 1.3-1.5, and transduced 
cells were seeded at a density of 3,000 onto 96-well black plates (Corning). MYC- 
ER HMECs with pINDUCER-shBUD31-3'UTR were treated with 300nM 
4-hydroxytamoxifen (4-OHT) to induce MYC hyperactivation, and with 32 ng 
ml ! dox (Sigma) to induce shBUD31 expression. SUM159 and MDA-MB-231- 
LM2 (LM2) cells were infected with pINDUCER-shBUD3] virus (targeting the 3’ 
UTR and coding region, respectively) at an MOI of 1.5, and seeded at a density of 
1,000 and 2,000, respectively. Expression of shBUD31 in LM2 and SUM159 cells 
was induced with 1 jig ml” ' dox. HMECs and breast cancer cells were re-fed every 
3-4 days until cells reached confluence. At confluence, cells were fixed in 4% 
paraformaldehyde, and nuclei were stained with Hoeschst3321 (1:1,000, Life 
Technologies). Nuclei were imaged and counted using the Celigo Imaging Cell 
Cytometer (Brooks). 

For clonogenic assays, breast cancer or immortalized epithelial cells were seeded 
at low density (between 500 and 2,000 cells per plate, depending on the cell line) 
into 6-cm plates, four replicates per treatment group. MYC-ER HMECs with 
pINDUCER-shBUD31-3'UTR were treated with 8ngml~' dox and 300nM 
4-OHT, and MYC-ER HMEGCs treated with 10 or 20 nM SD6 were also cultured 
with 200nM 4-OHT. Cells were re-fed every 4 days until colonies were mac- 
roscopic. The colonies were stained using Coomassie brilliant blue. Macroscopic 
colonies were quantified and normalized to vehicle-treated cells for each cell line. 

For the P493-6 cell line with pmyc-tet construct”, MYC was reduced by treating 
cells with 0.1 g ml tetracycline (Sigma) for 72 h. MYC was induced by washing 
P493-6 cells with PBS twice, then culturing cells in RPMI-1640 medium with 10% 
Tet System Approved FBS (Clontech) and 1% GlutaMAX. P493-6 cells were 
treated with or without 100 nM SD6 and with or without 0.1 pg ml’ tetracycline 
for 4 days. 

Immunoprecipitation and mass spectrometry. HeLa cells transduced with len- 
tivirus encoding BUD31 cDNA and non-transduced HeLa cells were collected, 
and nuclear extracts as well as whole-cell lysates were collected as described 
previously*’. Lysates were treated with RNase A (500g ml‘) for 1h on ice. 
For immunoprecipitations, nuclear and whole-cell extracts were ultracentrifuged 
at 100,000g, and incubated with 25 jig M2 Flag antibody (Sigma) for 1 h, followed 
by ultracentrifugation and incubation with Sepharaose-CL4B Protein A beads (GE 
Healthcare). Beads were washed with NTN (50 mM Tris-Cl, pH 8.0, 150 mM NaCl 
and 0.5% NP-40), and immunocomplexes were resuspended in 1X Laemmli 
buffer and resolved on pre-cast 4-20% Novex Tris-Glycine gels (Life Techno- 
logies). Gels were minimally stained with Coomassie brilliant blue, cut into 8 
molecular mass ranges, and digested with trypsin. Immunocomplexes were iden- 
tified on a Thermo Fisher LTQ mass spectrometer, and data processing was 
performed as previously described’. 

Enrichment analysis. Human GO annotation file (gene_association.goa_human. 
gz) was downloaded from http://geneontology.org/GO.downloads.annotations. 
shtml containing a GOC Validation date of 2 September 2013. Enrichment analysis 
was performed to consider the content of (1) BUD31-associated proteins, or 
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(2) genes with enhanced IR. Gene symbols annotated to BUD31-associated pro- 
teins were cross tabulated against all Gene Ontology annotations. Genes with 
enhanced IR were cross tabulated against the subset of Gene Ontology annotations 
for genes considered in this analysis. We used Fisher’s exact test to determine P 
values for the proportion of genes overlapping each annotation set. 

BiFC. BUD31 was cloned into the pQCXIN-N-YFP fusion vector, in which the 
BUD31 N terminus was fused to the N-terminal domain (residues 1-155) of 
Venus yellow fluorescent protein (YFP). Human splicing factor cDNAs were 
individually recombined into retroviral vectors with C-terminal Venus YFP (resi- 
dues 156-239) tags at the N-terminal ends. SUM159 breast cancer cells were 
transduced with these bait and prey BiFC retroviruses, and cellular fluorescence 
was analysed by flow cytometry in triplicate. 

BUD31 mutagenesis. Wild-type and mutant BUD31 cDNAs were generated by 
gene synthesis (IDT DNA) and recombined into the pQCXIN-N-YFP fusion 
vector. Mutant BUD31 consisted of substituting human BUD31 amino acid resi- 
dues 105-114 with an equivalent number of glycine residues (codon GGA). 

In vitro competition assay. MYC-dependent SUM159 breast cancer cells with 
pINDUCER-shBUD31-3’ UTR were transduced with viruses containing wild-type 
or mutant BUD31 or negative control cDNA recombined into pQCXIN-N-YFP 
vectors. Infected, GFP* cells are mixed at an 80:20 ratio with non-transduced, 
GFP parental cells and seeded into 96-well plates and treated either with or 
without dox (1 4gml~’). At confluence, cells were passaged 1:10 and processed 
for flow cytometry. The in vitro competition assay was continued for two passages. 
Immunoblotting. Cells were lysed in 1X SDS sample buffer (62.5 mM Tris-HCl, 
pH 6.8, 10% glycerol, 2% SDS, 2.5% B-mercaptoethanol) and heated at 95°C for 
12min. The following antibodies were used for western blotting: Flag (Sigma, 
A8592), BUD31 (ProteinTech, 11798-1-AP), SF3B1 (Bethyl, A300-996A), Prp19 
(Bethyl, A300-101A), U2AF1 (Bethyl, A302-079A), SF3A1 (Bethyl, A301-603A), 
EFTUD2 (Bethyl, A300-957A), SNRPF (Abcam, 154870), HER2 (Millipore, 06- 
562), EGFR (Cell Signaling, 2232), cleaved caspase-3 (Cell Signaling, 9664), RPS8 
(Assay Biotechnology, R12-3466), EIF2S1 (Abgent, AP13469s), eIF3I (p36) 
(Biolegend, 646701) and c-Myc (D84C12) (Cell Signaling, 5605). Vinculin 
(Sigma, V9131) and Ran (BD Biosciences, 610340) were used as loading controls. 
In vitro transcription. Uniformly **P-UTP radiolabelled MINX pre-mRNA was 
in vitro transcribed from a BamHI-digested plasmid*’, DNasel (Ambion) treated 
and gel-isolated on a 8 M urea 6% polyacrylamide gel. 

In vitro splicing. HeLa nuclear extracts used for in vitro splicing assays were made 
as described previously” from HeLa cells transduced with an inducible BUD31- 
targeting shRNA and grown in the presence or absence of 1 pg ml * dox. Splicing 
reactions of 15 yl contained: 8nM RNA substrate, 0.8 mM DTT, 1.7 mM mag- 
nesium acetate, 1.7 mM ATP, 17 mM phospho-creatine, 20 mM glycine, 1 U ul? 
RNasin Plus (Promega), 3.7% PVA and 50 pig of HeLa nuclear extracts. Splicing 
reactions were incubated for indicated time points at 30 °C and stopped by diges- 
tion with proteinase K (Ambion) for 30 min at 45 °C followed by RNA purifica- 
tion. RNA purified from splicing reactions was electrophoresed on 8 M urea 8% 
polyacrylamide gels, then exposed to a phosphorimager screen (Typhoon Trio 
phosphorimager, GE Healthcare). Alternatively, RNA purified from in vitro splic- 
ing reactions was added to RT-PCR reactions as previously described** with 
primers in exons 1 and 2 of MINX (forward: 5’-CGGAATTCGAGCTCGCCC-3' 
and reverse: 5’-GGATCCCCACTGGAAAGA-3’). PCR products were run on 6% 
non-denaturing polyacrylamide gels and visualized after staining with ethidium 
bromide. 

Spliceosome complex formation assay. In vitro splicing reactions were carried 
out as described above, placed on ice, and heparin was added to a final concen- 
tration of 2 1g pl '. Reactions were incubated in the presence of heparin at 30 °C 
for 5 min and immediately loaded onto 0.75-mm non-denaturing 4% acrylamide- 
0.4% agarose composite gels. Gels were run at 250 V at room temperature in 1X 
tris-glycine running buffer for 3 h, then placed on Whatman paper and exposed to 
a phosphorimager cassette. 

RNA isolation and qRT-PCR. RNA isolation was performed with the RNeasy 
Mini kit (Qiagen). Reverse transcription was performed using the High Capacity 
RNA-to-cDNA Master Mix (Applied Biosystems), and qPCR was performed 
using SYBR Green Master Mix (Applied Biosystems). The following primers were 
used: BUD31 forward: 5'-ACCAACTTCGGGACGAACTG-3’, reverse: 5'-CGG 
CCCACTTCCAGCTT-3'; EFTUD2 forward: 5'-CCTTCGTGTTGTCAGAGA 
GTGTCT-3’, reverse: 5'’-TGGGTTGGAGGTTGGTGAGT-3’; SF3B1 forward: 
5'-GTGGACAAAATGGCGAAGAT-3’, reverse: 5’-GAGCTTCATCAAGAGCT 
GCC-3'; SNRPF forward: 5'-GGGAATGGAGTACAAGGGCT-3’, reverse: 5'-CC 
CAGATGTCCAGACAAAGC-3’; U2AF1 forward: 5’-ACGTTTAGCCAGACCA 
TTGC-3’, reverse: 5'-TGTTCCTGCATCTCCACATC-3'; GAPDH forward: 5’- 
CCTCCCGCTTCGCTCTCT-3’, reverse: 5'-TGGCGACGCAAAAGAAGAT-3’. 

RNA-seq. pINDUCER11-shBUD31-3' UTR-infected MYC-ER HMECs were cul- 
tured for 72h with/without 16 ng ml! dox, and for 48h with/without 300 nM 
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tamoxifen in triplicates. Total RNA was isolated using the RNeasy kit (Qiagen). 
RNA samples were rRNA depleted, and NGS libraries were constructed and 
sequenced as 75 bp paired-end reads by lumina HiSeq 2000. 

Quality assessment of RNA-seq. RNA-seq NGS reads quality was evaluated 
using FastQC application (http://www.bioinformatics.babraham.ac.uk/projects/ 
fastqc/). 

Alignment of RNA-seq data. RNA-seq NGS reads were mapped using STAR 
RNASeq aligner (version 2.3.1). To improve mapping accuracy, the database 
file of splice junctions (http://it-collab01.cshl.edu/shares/gingeraslab/www-data/ 
dobin/STAR/STARgenomes/GENCODE/Old/gencode.v14.annotation.gtf.sjdb) 
was supplied at the genome index generation step with command line option- 
sjdbOverhang 7, together with http://it-collab01.cshl.edu/shares/gingeraslab/www- 
data/dobin/STAR/STARgenomes/GENCODE/Old/hg19_Gencodel4.overhang75/ 
and default parameters. Duplicate reads were marked with the MarkDuplicates 
function of the Picard-tools software package (http://picard.sourceforge.net; version 
1.107) using default settings. 

Intron-exon junction definition. To prevent confounding effects in our analysis 
of IR within HMECs, we confined our analyses to exons in non-overlapping 
genes that are included within all isoforms of a given gene (75,623 junctions in 
6,861 genes). 

Intron-exon junctions were obtained using the University of California 
Santa Cruz Genome Browser ‘knownGene’ table (downloaded 4 June 2014). 
Constitutive junctions were defined as junctions that (1) appear in each transcript 
annotated to a given gene symbol, (2) do not overlap with any transcript annotated 
to a different gene symbol, and (3) do not mark the start or stop of a transcript. 
Junction IR calculation. Because analysis of intronic reads may be influenced by 
the presence of stable RNAs within introns and/or spliced lariats, we calculated 
junction IR as the ratio of exon-intron reads to exon-exon reads, restricting the 
analysis to reads directly spanning exon-intron or exon-exon junction sequences. 
We used R together with the Rsamtools package to calculate IR. In brief, for each 
intron-exon junction, we extracted all non-duplicate reads overlapping this junc- 
tion. Next, we assigned these reads into two categories: (1) ‘intronic’ if the read 
mapped to at least the first base of the intron, (2) ‘exonic’ if none of the bases of the 
read mapped to the first base of the intron and at least one base mapped to a 
subsequent exon. We counted the total number of reads assigned into each cat- 
egory for each junction. IR was calculated as: 


Gii+l 
E,i+1 


IR, = log, 


in which IR;, represents the IR score for junction j in sample i, and J; and Ej, refer 
to the count of reads classified as intronic and exonic for junction j in sample i, 
respectively. To avoid ratios with 0 in the denominator, we added 1 to each of these 
counts. The scripts used to conduct this calculation are available on request. 
We restricted all following analyses to intron—exon junctions with an average of 
at least 25 total (intronic and exonic) reads in the control and MYC-hyperactivated 
samples. 

Gene IR calculation. For the cumulative distribution analyses, the mean IR score 
for all junctions in a gene were averaged. 

Gene annotation. A custom gene annotation file was generated to correspond to 
the set of intron—exon junctions considered in the IR analysis. In brief, exons were 
defined as: (1) an exon flanked by two junctions annotated to the same symbol, 
and (2) an exonic region flanked by one junction and conserved across all tran- 
scripts annotated to the same symbol. 

Statistical analysis of RNA-seq. Statistical analyses were performed using the 
open source statistical programming environment ‘R’. Empirical cumulative dis- 
tributions of IR scores were compared using two-sided Kolmogorov-Smirnov test 
and Wilcoxon test. 

Permutation-based test of significance. The significance of the difference of 
empirical cumulative distributions of junction-level IR scores was evaluated using 
a permutation-based approach. The null hypothesis was that splicing perturba- 
tions had no effect on IR changes in the MYC-normal and MYC-hyperactivated 
states. To model this null hypothesis, the treatment information was blinded to the 
assignments of MYC activity. We generated a third control sample by randomly 
selecting half of the junctions from samples 1-control and 2-control. Control and 
MYC samples were grouped as 6 ‘normal’ samples, and LowBUD31 and 
Myc_LowBUD31 samples were grouped as ‘splicing perturbed’ samples. Next, 
we comprehensively generated all possible normal and splicing perturbed con- 
trasts by subtracting the average of each junction IR score of three normal samples 
from that of the splicing perturbed group. The empirical distribution of all possible 
double differences was generated and used to assign significance to the original 
observations. An analogous approach was used to evaluate the difference of empir- 
ical cumulative distributions of gene-level IR scores. 


qPCR IR validation assay. Amplification reactions were prepared using SYBR 
Select Master Mix (Applied Biosystems) according to the manufacturer’s instruc- 
tions with final primer concentration of 300 nM. Reactions were performed using 
a StepOnePlus Real Time PCR System (Applied Biosystems) with an initial 
incubation at 95°C for 10 min followed by 40 cycles of 15 s at 95°C and 1 min 
at 60 °C. Primers were designed using Primer3 (available at http://bioinfo.ut.ee/ 
primer3/) and assessed for quality using Beacon Designer (http://www.premier- 
biosoft.com/qpcr/) and UNAFold (https://www.idtdna.com/UNAFold). Primers 
used for each reaction were: HTRA1_IE forward: 5'’-GCGTTCATTTTAAGGT 
GCTACAGG-3’, reverse: 5'-TGGGCATTTGTCACGATCAGT-3’; HTRA1_EE 
forward: 5'-GACGTGGTGGAGAAGATCGC-3’, reverse: 5'-AAACCCAGACC 
CACTAGCCA-3’; PRPF19_IE forward: 5'-TCCCCTTGTGTGACCTTCTCT-3’, 
reverse: 5’-AGAATCTCCGTCCATTGTTTGC-3’; PRPF19_EE forward: 5’-AG 
AACTTTAAGACTTTGCAGCTGG-3’, reverse: 5'-TCTCCGTCCATTGTTTG 
CAGA-3'; UBALD2_IE forward: 5’-GCTGCGTTTCCTGACTCCG-3’, reverse: 
5'-GITGGTGGCTGTTGGGAATGT-3’; UBALD2_EE forward: 5'-CAGTTGCT 
GCAGGCGGCC-3’, reverse: 5’-TGGAAGAACGTGCTCAGCGC-3’. 

Ultramer oligonucleotides (Integrated DNA Technologies) were synthesized 
to match the predicted amplicon of each primer pair, and standard curves 
were generated for each reaction using threefold serial dilutions of these control 
templates ranging in concentration from 4.0 X 107’? M to 1.6 X 10 '*M. The 
sequence AAGAA was added to both the 5’ and 3’ ends of each template to 
facilitate primer binding. Control template sequences for intron-exon (IE) and 
exon-exon (EE) regions were as follows: HTRA1_IE: 5‘-AAGAAGCGTTCATTT 
TAAGGTGCTACAGGCTTAAGTGTGTACTCCTTTGGATTTTAGGCTTCCG 
TTTTCTAAACGAGAGGTGCCGGTGGCTAGTGGGTCTGGGTTTATTGTG 
TCGGAAGATGGACTGATCGTGACAAATGCCCAAAGAA-3'; HTRAI_EE: 
5'-AAGAAGACGTGGTGGAGAAGATCGCCCCTGCCGTGGTTCATATCGA 
ATTGTTTCGCAAGCTTCCGTTTTCTAAACGAGAGGTGCCGGTG-GCTAG 
TGGGTCTGGGTTTAAGAA-3'; PRPF19_IE: 5'-AAGAATCCCCTTGTGTG 
ACCTTCTCTCTTTCTATTTCTGGCAGGTAAAGTCACTGATCTTTGACC 
AGAGTGGTACCTACCTGGCTCTTGGGGGCACGGATGTCCAGATCTAC 
ATCTGCAAACAATGGACGGAGATT-CTAAGAA-3’; PRPF19_EE: 5'-AAGA 
AAGAACTTTAAGACTTTGCAGCTGGATAACAACTTTGAGGTAAAGTCA 
CTGATCTTTGACCAGAGTGGTACCTACCTGGCTCTTGGGGGCACGGATG 
TCCAGATCTACATCTGCAAACAATGGACGGAGA-AAGAA-3'; UBALD2_IE: 
5'-AAGAAGCTGCGTTTCCTGACTCCGCCTGGCCCGCCGTGTCACTGCC 
CTGTTTGTCCGCAGACCGCGCTGAGCACGTTCTTCCAAGAAAC-CAACA 
TTCCCAACAGCCACCACAAGAA-3'; UBALD2_EE: 5'-AAGAACAGTTGCT 
GCAGGCGGCCCACTGGCAGTTCGAGACCGCGCTGAGCACGTTCTTCC 
AAA-GAA. 

C, values from each reaction were interpolated on the standard curve generated 
using the corresponding control template to approximate the concentration of 
cDNA template in each experimental sample. These values were then reported as 
the ratio of intron-exon to total (IE + EE) cDNA template in each sample. 
Transcription pulse assay. MYC-ER HMECs with pINDUCER11-shBUD31- 
3'UTR were cultured with/without 16 ng ml! dox and/or with/without 300 nM 
4-OHT. Cells were pulsed with 500 1M 4-thiouridine (4-SU, Sigma) for 2h, and 
collected for total RNA using RNeasy mini kit (Qiagen). 4-SU-labelled RNA was 
purified from 20 jig total RNA. Isolation of newly transcribed RNA was performed 
as described”’ using 100 il streptavidin beads (Miltenyi Biotec). 

Poly(A)* RNA isolation. Dynabeads Oligo(dT).; (Life Technologies) were equi- 
librated with 50 il lysis/binding buffer, and total RNA was heat denatured (70 °C 
for 2 min) before binding poly(A)* RNA to Dynabeads. Isolation of mRNA was 
performed according to manufacturer’s instructions. Poly(A)* RNA concentra- 
tions were measured with a fluorescence plate reader (Molecular Devices) using 
Quant-iT RiboGreen reagent (Life Technologies). 

Poly(A)* RNA LNA FISH. pINDUCER11-shBUD31-3’UTR-transduced MYC- 
ER HMECs were seeded onto collagen-coated 8-well glass chamber slides and 
cultured with/without 16 ng ml‘ dox and with/without 300 nM tamoxifen. Cells 
were treated with/without 2 pg ml actinomycin D (Gibco) or DMSO for 5h 
before fixation in 4% formaldehyde and 5% acetic acid in PBS for 15 min at room 
temperature. Fixed cells were washed with PBS, permeabilized with proteinase K 
(5g ml’, Life Technologies) and treated with/without RNase A (100g ml *, 
Sigma) for 30 min at 37 °C in PBS. Dehydration of the cells was performed with 
70%, 95% and 100% ethanol solutions. FITC-labelled oligo(dT)»; locked nucleic 
acid (LNA) probes were heated to 90 °C for 4 min, then cooled to hybridization 
temperature (55 °C). Dehydrated and dried cells were incubated in 40 nM of LNA 
probes in hybridization buffer (50% formamide, 2 X SSC, 50 mM NaPi, pH 7.0, 
10% dextran sulphate) overnight at 55°C. Chamber slides were washed with 
5 X SSC, 1 X SSC, 0.2 X SSC and PBS, and dehydrated before counterstaining 
with DAPI and mounting with Fluoromount-G (Southern Biotech). Cells were 
imaged using a Nikon Ti-E inverted microscope with 40X air objective and Andor 


©2015 Macmillan Publishers Limited. All rights reserved 


Zyla 4.2 sCMOS camera. For each treatment condition and actinomycin time 
point, =150 cells were analysed for mean FITC intensity. Cellular FITC values 
were adjusted for background fluorescence by subtracting the mean extra-cellular 
pixel value. Image analysis was performed using Nikon Elements. 

Luminescent apoptosis assays. Caspase-3/7 activity was assessed in MYC-ER 
HMECs and breast cancer cell lines by incubating Caspase-Glo 3/7 Reagent with 
cells in triplicate wells of a 96-well plate and measuring luminescence with a plate 
reader (Molecular Devices). Luminescence was normalized using cell numbers 
determined by Hoeschst3321 staining of a duplicate plate, followed by nuclei 
counting using the Celigo Imaging Cell Cytometer (Brooks). 

Tumorigenicity and metastasis assays. SUM159 breast cancer cells were trans- 
duced with pINDUCER11-shBUD31-3'UTR virus and analysed by flow cytome- 
try to confirm >98% transduction. In total 8 X 10° transduced cells were injected 
with matrigel (BD Biosciences) subcutaneously into the flank of four-week-old 
female athymic nude Foxnl-nu mice (Harlan Labs). Tumour volume was mea- 
sured using calipers, and once tumours achieved 150 mm?, mice were randomized 
onto and maintained on sucrose water (— dox) or sucrose water with dox (+dox). 

For mixed population experiments, MDA-MB-231-LM2 breast cancer cells 
were individually transduced with pINDUCER11-shRNAs targeting the indicated 
genes at an MOI appropriate to transduce all cells (1.3-1.5). The individual popu- 
lations were mixed at equal ratios in vitro and expanded before injection. Around 
3 X 10° or 2 X 10° mixed population cells were injected subcutaneously into the 
right flank or into the lateral tail vein of four-week-old female athymic nude 
Foxnl-nu mice (Harlan Labs), respectively. Subcutaneous tumour volume was 
measured with calipers over time. Mice were randomized onto sucrose water 
(—dox) or sucrose water with dox (+dox) after tumours exceeded 150mm’. 
Lung metastatic progression was monitored and quantified using noninvasive 
bioluminescence as described previously’®. When tumours reached 1,000 mm? or 
the total luminescence flux reached 1 X 10°, genomic DNA from dissected tumours 
or lungs were collected using the QlAamp DNA mini kit (Qiagen). qPCR was 
performed with SYBR Green PCR Master Mix (Life Technologies) using manu- 
facturer’s recommendations and the following primers. Experimental target C, 
values were normalized to the TRE C, values, and NCOR2 was used as a negative 
control. 

The following primers were used. BUD31 forward: 5'-TGGAAGACATCTGCG 
TGGTATT-3’, reverse: 5’-CGCGCAAACCTAAAGGCATA-3’; SF3B1 forward: 
5'-GCCGTATCATTAGTACGCCATA-3’, reverse: 5'-TCGATCCTAGGACG 
GGGTAT-3’; MYC forward: 5’-GCCGGCCATATTTTCACTTC-3’, reverse: 5’- 
CACACCTACCGAAAAACAAAC-3'; NCOR2 forward: 5'-AACTTCCGGTGC 
TGTCGTTT, reverse: 5’-CGCGTCCTAGGTAATACGACTCA-3’; TRE forward: 
5'-TGTACGGTGGGAGGCCTATATAA, reverse: 5'-GCGTCTCCAGGCGAT 
CTG-3’. 

For SD6 drug infusion studies, 3 X 10° or 2 X 10° MDA-MB-231-LM2 breast 
cancer cells were injected into the flank or lateral tail vein of four-week-old female 
athymic nude Foxn1-nu mice (Harlan Labs), respectively. For mice with subcutan- 
eous tumours, jugular vein catheters (SAI Infusion Technologies) were surgically 
implanted into each mouse 13-16 days after injection, and were randomized to 
receive vehicle (n = 11) or SD6 (n = 10) infusion. Tail-vein-injected mice were 
randomized to receive vehicle (n = 7) or SD6 (m = 6) infusion 1 day after tail vein 
injections. Animals received daily infusions of vehicle (10% 2-hydroxypropyl-B- 
cyclodextrin dissolved in 50 mM Na,HPO,/NaH>PO,, pH 7.4) or 50 mg kg! of 
SD6 for 20 consecutive days (subcutaneous cohort) or 10 consecutive weekdays 
(tail vein cohort). Mice were infused via jugular catheter at a rate of 3.5 yl min — : 
with a Fusion 200 Touch Syringe Pump (SAI Infusion Technologies). The total 
volume infused did not exceed 500 ul per day. Subcutaneous tumour volumes were 
monitored with calipers, and lung metastatic progression was monitored with 
noninvasive bioluminescence. Mice were euthanized once tumours reached 
2,000 mm? or the total luminescence flux reached 1 X 10”. In progression-free 
survival analyses, progression is defined as fivefold increase in pulmonary bio- 
luminescence relative to initial values or fourfold increase in subcutaneous tumour 
volume relative to its volume at time of randomization. 

Investigators responsible for monitoring and measuring the xenografts of indi- 
vidual tumours were not blinded. Simple randomization was used to allocate 
animals to experimental groups. All animal studies were performed in accordance 
with institutional and national animal regulations. Animal protocols were 
approved by the Institutional Animal Care and Use Committee at Baylor 
College of Medicine. 

Power analysis was used to determine appropriate sample size to detect signifi- 
cant changes in animal survival, which were based on previous survival analyses in 
our laboratory. All animals were included in analyses. 

Pooled shRNA screens in breast cancer cell lines. Pooled shRNA screens were 
performed on 68 breast cancer lines and four non-malignant immortalized 
mammary epithelial lines, essentially as described**. In brief, cells are infected with 
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alentiviral shRNA library at a MOI of 0.3, and passaged under standard conditions. 
At 4 and 8 doublings, respectively, DNA is isolated and hybridized to a customized 
chip to assess shRNA dropout. A detailed description of the results of these screens 
will be published separately (R.M., A. S. and B.G.N., manuscript in preparation). 
Correlation between MYC dependency and spliceosome dependency. First, 
to calculate MYC dependency scores using probable on-target hairpins, we used 
assay observations associated with 3 hairpins incorporated into the first ATARIS*” 
solution for MYC. MYC dependency scores were generated using a hierarchical 
linear model, with pooled shRNA screen observations as the independent variable 
and two regression covariates: initial signal intensity (with coefficient fo) and 
linear time-course dropout trend (with coefficient /,). The dropout trend is cal- 
culated for each cell line separately, resulting in a per-cell-line MYC dropout score 
(the value of coefficient 1). 

Second, the MYC dependency score was used in a hierarchical linear model to 
search for associations with the essentiality of other genes (such as spliceosome 
encoding genes). This model uses pooled shRNA screen observations as the inde- 
pendent variable and three regression covariates: initial assay signal intensity 
(with coefficient f,), linear time-course dropout trend (with coefficient /,), and 
an interaction term between dropout trend and MYC dependency score (with coef- 
ficient $y). The P-value associated with the interaction term [3 is used to determine 
whether a significant association exists. A detailed description of this approach will 
be published elsewhere (A.S., R.M. and B.G.N., manuscript in preparation). 

A summary statistic using results from the single-gene analyses was used to test 
the significance of the association between MYC dependency and the essentiality 
of a gene set. For a gene set containing genes g, we calculate the gene set summary 
statistic as 


- >. sign(By,)log,)(P value(fy,)) 


gene g 


in which sign(f4) and Pvalue(f) indicate the values associated with the regres- 
sion coefficient Py. The resulting metric, termed a siMEM (mixed-effect model) 
score, indicates the significance and correlation between sensitivity to MYC 
shRNAs and sensitivity to a group of shRNAs targeting a gene set (such as those 
targeting the spliceosome). A gene set (for example, spliceosome genes) for which 
a substantial number of genes are significantly associated with MYC dependency, 
and all with the same direction (sign) of association, will have a large positive score. 
When calculated for the gene set consisting of the core spliceosome, this value 
summarizes the direction and strength of the significance observed across genes in 
the spliceosome. To determine whether this observation is significant, the same 
statistic is calculated for 100,000 randomly drawn gene sets of the same size as the 
core spliceosome, yielding the null distributions of gene set summary statistics in 
Fig. 4b, c. 
Statistical analysis. All experiments were performed on biological replicates 
unless otherwise specified. Sample size for each experimental group/condition is 
reported in the appropriate figure legends and methods. For cell culture experi- 
ments, sample size was not predetermined, and all samples were included in 
analyses. For significance testing, analyses were chosen if data met the assumptions 
of the tests. Data was checked for comparable variance before statistical analysis. 
Statistically significant differences between control and experimental groups were 
determined using two-tailed unpaired Student’s t-test, one-way ANOVA with 
Tukey-Kramer minimum significant difference test, Mann-Whitney test, 
Kolmogorov—Smirnov test, Wilcoxon test, permutation-based test of significance, 
and log-rank test as indicated in the appropriate figure legend and methods text. 
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Extended Data Figure 1 | Validation of BUD31 as a MYC-synthetic lethal _ biological replicates, **P < 0.01, two-tailed Student’s f-test). c, Caspase-3/7 
gene in HMECs. a, qRT-PCR analysis of BUD31 mRNA level (mean = s.d., activation by caspase luminescence assay (mean + s.e.m., n = 3, ***P < 0.001, 
n = 3 biological replicates). b, Clonogenicity of MYC-ER HMECs with or one-way ANOVA). d, Flag-tagged protein levels in MYC-ER HMECs in 
without MYC hyperactivation or BUD31 depletion (mean + s.e.m., n = 4 which vinculin was used as a loading control. 
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Extended Data Figure 2 | BUD31 interacts with core spliceosomal factors 
and is required for spliceosomal assembly and pre-mRNA splicing. a, 134 
core spliceosomal proteins are listed. Proteins in red are shown to interact 
with BUD31, as discovered by Flag-BUD31 immunoprecipitation mass 
spectrometry and BUD31 BiFC. b, Heat map of BUD31-interacting 
spliceosomal proteins, organized by spliceosome sub-complexes. A black-green 
colour scale depicts normalized BiFC interaction values between spliceosomal 
proteins and negative control protein (technical replicates in two left lanes) 
and BUD31 (technical replicates in two right lanes). c, Spliceosomal snRNPs 
(coloured circles) interact in a stepwise manner to excise intronic sequences 
from pre-mRNA. snRNPs with proteins identified from the BUD31 immuno- 
precipitation and mass spectrometry are noted (blue outline) to be BUD31- 
associated. d, Co-immunoprecipitation of Flag-BUD31 for non-spliceosomal 
proteins. Input and immunoprecipitation blots probed by EIF2S1 and EIF31 
were taken at different exposures to minimize background signal. e, Interaction 
between N-YFP-tagged BUD31 and C-YFP-tagged spliceosomal (DDX46) or 
cytoplasmic proteins (TRIM9, SOCS2 and EPHA8) was assessed by cellular 


fluorescence (mean + s.e.m., n = 3 technical replicates). f, Nuclear extracts 
with or without BUD31 knockdown were incubated with pre-mRNA substrate, 
and RT-PCR of unspliced RNA (top) and spliced RNA (bottom) was 
performed, using primers at the indicated arrows (left). BUD31 protein levels in 
the nuclear extracts were normalized to vinculin expression (middle) and 
quantified (right). g, Radioactively labelled pre-mRNA (MINX) was incubated 
with nuclear extracts with or without BUD31 depletion. RNA purified from 
the splicing reaction was run on a denaturing gel and imaged by autoradio- 
graphy. The identities of prominent bands are based on size. Asterisk denotes 
putative intron-lariat band. h, After in vitro splicing was performed as 
described previously, products were electrophoresed on native gel, and 
spliceosome complexes were visualized by autoradiography. Complex A and 
nonspecific H complexes are labelled. i, Phosphorimager quantification of the 
ratio of RNA in complex A compared to that in complex H. j, Interaction 
between N-YFP-tagged wild-type (WT) or mutant BUD31 and C-YFP-tagged 
splicing factors was assessed by cellular fluorescence (mean + s.e.m., n = 2 
technical replicates, ***P < 0.001, two-tailed Student’s t-test). 
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Extended Data Figure 3 | HMECs with oncogenic activation of HER2 
and EGFR do not require BUD31. a, Cell number changes in HMECs 
with inducible shBUD31 and constitutive HER2 or EGFR expression 
(mean + s.e.m.; n = 4 technical replicates; *P < 0.05, two-tailed Student’s 
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t-test). HER2 and EGFR protein is normalized to vinculin (right). b, MYC 
protein levels in HMECs with constitutive HER2 or EGFR expression. c, MYC 
induction by tamoxifen in MYC-ER HMECs does not increase cell proliferation 
over time (mean = s.e.m., n = 8 technical replicates). 
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Extended Data Figure 4 | Partial knockdown of core splicing factors is luminescence in MYC-ER HMECs with partial suppression of core 
MYC-synthetic lethal in HMECs. a-d, mRNA levels for core splicing spliceosomal proteins (e-h) or spliceosome inhibitor SD6 (i) (mean + s.e.m., 


factors SF3B1 (a), U2AF1 (b), EFTUD2 (c) and SNRPF (d) were evaluated n= 3 technical replicates, ***P < 0.001, one-way ANOVA). 
by qRT-PCR (mean + s.d., n = 3 technical replicates). e-i, Caspase-3/7 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


2 
Lom 


20 Control MYC 


| 
° I 
N.S. 


20 5 Control MYC 


-40 


-60 


% change in nuclear poly(A)+ 
fluorescence after actinomycin D 


-80 


% change in cellular poly(A)+ 
fluorescence after actinomycin D 


= -shBUD31 -shBUD31 
m= +shBUD31 = +shBUD31 


Extended Data Figure 5 | BUD31 loss in MYC-hyperactivated cells 
destabilizes mRNA. a, b, MYC-ER HMECs with inducible shBUD31 treated 
with actinomycin D for 5 h were labelled with oligo(dT).; LNA probes via 
fluorescence in situ hybridization. Cellular FITC intensity was assessed within 
cellular (a) and nuclear (DAPI+) (b) regions. Data are represented as the 
difference in cellular FITC intensity between 0 and 5h of actinomycin D 
treatment in each cell state (mean + s.e.m., n = 150, ***P < 0.001, two-tailed 
Student’s t-test). 
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Extended Data Figure 6 | BUD31 depletion in MYC-hyperactivated cells genes by shRNA decreased cell viability (mean barcode abundance = s.e.m.). 
enhances intron retention and decreases expression of cell-essential genes. | Twofold decrease in barcode abundance is noted by the dashed red line. 

In MYC-hyperactive cells, 17 representative genes display increased IR and All values are reflective of three biological replicates, and genes are colour- 
decreased steady-state RNA levels after BUD31 knockdown. Depletion ofthese coded based on their Gene Ontology term annotation. 
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Extended Data Figure 7 | MYC-dependent breast cancer cells require 
BUD31 for in vitro and in vivo growth. a, Relative cell number of SUM159 
cells with doxycycline-inducible shBUD31 in vitro (mean + s.e.m., n = 8 
technical replicates, ***P < 0.001, two-tailed Student’s t-test). b, Caspase-3/7 
luminescence in BUD31-depleted SUM159 cells (mean + s.e.m., n = 3 
technical replicates, ***P < 0.001, two-tailed Student’s t-test). c, d, SUM159 
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cells engineered with dox-inducible shBUD31 were subcutaneously 
transplanted into mice and randomized onto dox treatment (—dox n = 10, 
+dox n = 9). Loss of BUD31 in SUM159 xenografts inhibits tumour growth 
(mean + s.e.m., ***P < 0.001 at day 21, two-tailed Student’s t-test) 

(c) and prolongs progression-free survival (d) in nude mice (P-value, 
log-rank test). 
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Extended Data Figure 8 | BUD31 depletion does not affect levels of MYC 
protein. a, MYC protein levels in MYC-ER HMECs with inducible 
shBUD31 expression normalized to vinculin expression. To confirm specificity 
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of MYC antibody, HMECs without the MYC-ER construct were engineered inducible MYC shRNA. 
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to express inducible MYC shRNA. b, MYC protein levels in SUM159 and LM2 
cells with inducible shBUD31 normalized to vinculin expression. To confirm 
specificity of MYC antibody, SUM159 cells were engineered to express 
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Extended Data Figure 9 | Schematic for in vivo barcode-based competition 
assay. LM2 cells transduced with inducible shRNAs targeting negative 
control genes or candidate genes were mixed at an equal ratio. This mixed 
population was transplanted into mice, and tumours were allowed to form in 
the presence or absence of dox. At the experimental endpoint, genomic DNA 
was isolated for comparisons of relative barcode (shRNA) abundance in 
tumour genomic DNA. 
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Extended Data Figure 10 | Spliceosome inhibitor SD6 inhibits MYC- four days, and cells were counted for relative cell number changes 
dependent cancer cells in vitro and in vivo. a, MYC-dependent breast cancer (mean ~ s.e.m., 2 = 3 biological replicates, ***P < 0.001, one-way ANOVA). 
cells (SUM159 and LM2) and MYC-normal immortalized epithelial cells (F7 _c, Kaplan-Meier survival analysis of nude mice with pulmonary seeding of 
and HME}1) were cultured with SD6 at low density and analysed for clonogenic | LM2 cells treated with or without SD6 for 10 days (vehicle n = 7, SD6 n = 6, 
growth. b, MYC-repressible human B-cell line P493-6 was treated with or P-value by log-rank test). 

without 100 nM SD6 in the absence or presence of MYC hyperactivation for 
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Replisome speed determines the efficiency of the 
Tus— Ter replication termination barrier 


Mohamed M. Elshenawy’, Slobodan lergic’, Zhi-Qiang Xu*, Mohamed A. Sobhy’, Masateru Takahashi‘, Aaron J. Oakley’, 


Nicholas E. Dixon? & Samir M. Hamdan 


In all domains of life, DNA synthesis occurs bidirectionally from 
replication origins. Despite variable rates of replication fork pro- 
gression, fork convergence often occurs at specific sites’. Escherichia 
coli sets a ‘replication fork trap’ that allows the first arriving fork to 
enter but not to leave the terminus region’*. The trap is set by 
oppositely oriented Tus-bound Ter sites that block forks on 
approach from only one direction* ’. However, the efficiency of fork 
blockage by Tus- Ter does not exceed 50% in vivo despite its appar- 
ent ability to almost permanently arrest replication forks in vitro** 

Here we use data from single-molecule DNA replication assays and 
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structural studies to show that both polarity and fork-arrest effi- 
ciency are determined by a competition between rates of Tus dis- 
placement and rearrangement of Tus- Ter interactions that leads to 
blockage of slower moving replisomes by two distinct mechanisms. 
To our knowledge this is the first example where intrinsic differences 
in rates of individual replisomes have different biological outcomes. 

In the circular E. coli chromosome, two replication forks move from 
the replication origin to converge opposite in a region that contains ten 
23-base-pair Ter (termination) sites and the dif site for chromosome 
segregation®”’ (Fig. 1a). The Ter sites are arranged in two oppositely 
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Figure 1 | Fate of the E. coli replisome upon encountering Tus- TerB. 

a, Polarity of the replication fork trap. Replication occurs bidirectionally from 
oriC. Each fork passes through the first five permissive (P) Ter sites, but is 
arrested on encounter with one of the next five non-permissive (NP) sites. 

b, Structure of the Tus—Ter locked complex (PDB: 2106)°. Strand separation of 
C6 (yellow) at the NP face induces its flipping into a specific binding pocket on 
Tus. c, Schematic of the single-molecule setup for observing leading-strand 
synthesis, which converts the tethered dsDNA (long) to ssDNA (short), 
displacing the bead opposite the flow. d, Representative synthesis trajectories 
upon encountering Tus-TerB oriented to the P (PTerB, left) or NP (NPTerB, 
right) faces. kbp, kilobase pairs. e, Percentages of forks that bypassed, 
transiently or fully stopped at the P or NP face. Error bars correspond to 
standard deviations of binomial distributions; N = 60, 64 and 37 for NPTerB 


(—Tus), NPTerB (+Tus) and PTerB (+Tus), respectively. f, Effect of Tus 
concentration on arrest activity at the NP face. Tus was present continuously 
with the replication proteins. Washing excess DNA-unbound Tus (80 nM) 
before introduction of replication proteins resulted in 38% stoppage. g, Rate 
dependence of replication stalling at the NP face. Rate distributions of events 
that bypassed (grey; N = 33) or stopped/restarted (blue bars; N = 31) were fit 
with Gaussian distributions. Fit lines are shown; the uncertainty corresponds to 
the standard error. h, Percentages of forks that bypassed, transiently or fully 
stopped at Tus bound to the NPTerB site containing a bubbled-DNA structure 
in place of base pairs 3—7 of TerB, while keeping C6 (5-mismatch C6-NPTerB). 
Error bars correspond to standard deviations of binomial distributions; N = 14 
and 27 in absence and presence of Tus, respectively. 
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oriented groups”’, and each of them is tightly bound to the monomeric 
protein Tus*’®. The lack of symmetry in Ter sequences fixes the ori- 
entation of the Tus-Ter complex such that forks are blocked at its 
non-permissive (NP) face, but allowed to pass from the permissive 
(P) end®"'. The two Ter clusters thus form a trap from which the first 
arriving fork can enter but not leave, awaiting arrival of the other’. 

The mechanism determining polarity of Tus-Ter action serves as a 
model for communication between replication forks and double- 
stranded (ds) DNA-binding proteins, but it is also controversial. 
Strand separation by the DnaB helicase at the NP face can engineer 
a new structure, the ‘locked’ complex of the mousetrap model 
(Fig. 1b)°. Cytosine(6) of Ter flips out of the DNA helix to form new 
interactions in a pocket on Tus that markedly prolongs the lifetime 
(>40-fold) of the Tus—Ter complex, protecting the central interactions 
from the trailing polymerase. Conversely, strand separation at the 
P face rapidly dissociates Tus. 

Despite the stability of the locked complex in vitro, at any sampling 
time in reporter plasmids in vivo, even when Tus is overproduced, 
~50% of forks moving towards the NP face displace Tus*’. This could 
be due either to the fork block being transient or to its low efficiency of 
formation. The Kp of the Tus-TerB locked complex is only threefold 
lower than Tus—dsTerB while its lifetime is much longer®, so we tested 
the hypothesis that the efficiency of lock formation is kinetically con- 
trolled, that is, that NP fork-arrest efficiency is determined by com- 
petition between lock formation and Tus displacement, dependent on 
the rate of fork approach. Inherent inefficiency of fork arrest would 
also explain the presence of backup Ter sites in the chromosome. 
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We used single-molecule imaging to monitor the fate of the E. coli 
leading-strand replisome as it approaches Tus-TerB from either dir- 
ection. Real-time synthesis trajectories were derived from multiplexed 
arrays by monitoring the length of individual DNA molecules’*"*. The 
forked primer-template DNAs, each with a single TerB site 3.6 kilobase 
(kb) from the site of fork assembly, were tethered between the surface ofa 
coverslip and a magnetic bead (Fig. 1c and Extended Data Fig. 1) and 
extended by a laminar flow exerting a 2.6 pN drag force on the beads. The 
trajectories (Fig. 1d) show DNA shortening through its conversion 
from ds (long) to single-stranded (short) during leading-strand 
synthesis. The position of TerB could be defined to +0.1 kb under this 
force regime (see Methods). Consistent with previous single-molecule 
studies of DNA replication’’"”, rates of DNA synthesis vary among 
replisomes (Extended Data Fig. 2), reflecting the in vivo situation’®. 
The source of this heterogeneity is unknown (Supplementary 
Discussion 1). 

In the absence of Tus, 5 + 3% of forks that reached the TerB site 
stopped there by chance (Table 1, Fig. le, Extended Data Fig. 3a, b and 
Supplementary Discussion 2). With Tus-TerB oriented with its P face 
towards the fork (PTerB), this frequency increased to 11 + 5% (includ- 
ing the 5% random stoppage), presumably owing to forks encounter- 
ing a strong protein-DNA roadblock (Fig. le and Supplementary 
Discussion 3). Transient stoppage followed by resumption of synthesis 
occurred in 5% of trajectories, and in the remaining 84%, replication 
forks displaced Tus and continued synthesis without stopping, even 
transiently (Fig. 1d, e). The average rate of DNA synthesis was other- 
wise unaffected by Tus (Extended Data Fig. 3c). 


Table 1 | Fate of replisomes and fork rate dependencies of events at Tus-bound Ter sites 


5’-AATAAGTATGTTGTAACTAAAGT P Tus Stop (%) Bypass (%) Restart (%) Pause time (s) Stop/restart rate (bp s~?) Bypass rate (bp s+) 
Ter \p (prATTGATACAACATTGATTTCA-5’ 
sencsesccosscssocoscsese™ ”°! WTTus 11+5 84+6 5+4 31+ 24 330 + 30 300 + 110 (1,250 + 120) 
Ceecerecesoccosoooooseso.. 
PTerB 
NoTus 5+3 95+3 0 = 800 + 100 ,160 + 70 (930 + 70) 
DnaB Sas ccepnnecennenoqnonoses WTTus 45+6 52+6 342 146+ 31 890 + 70 (840 + 40) ,690 + 100 (1,700 + 140) 
Pol qeerewnenowerenoveneoonese: H144A  27+8 55+9 18+7 33 +5 840 + 120 (740 + 10) 920 + 130 (1,520 + 120) 
NPTerB [GC(6)-NPTerB] R198A* 5+5 7749 18+8 14+4 400 + 40 300 + 140 (1,310 + 70) 
DnaB c WTTus 8+4 47+8 45+8 29+ 6(37 + 6) 820 + 70 (720 + 80) 810 + 110 (1,740 + 60) 
ei WTTust 18+9 41+11 41+11 56+13 800 + 80 (790 + 70) 820 + 180 
CG(6)-NPTerB 
DnaB T 
- sos SUSUREESESSEISSEISIES WTTus 7+4 T3427 20+6 22+5(24 + 2) 380 + 60 (360 + 40) 350 + 110 (1,310 + 50) 
TA(6)-NPTerB 
DNaB Coo ggeoeetgooococoooccoocs WTTus 89+6 11+6 0) - 1160 + 130(1030 + 130) 1,230+310 
Pol @8® : R198A* 86+7 14+7 0) - 1130 + 120 (950 + 170) ,200 + 270 
5-mismatch C6-NPTerB 
Des Sucseececeasseaseaseaseass WTTus 8+5 15:7 77 +8 111 +18 (177+ 20) 1130+ 140 (960 + 170) 230 +170 
fe) 
5-mismatch G6-NPTerB 
DnaB “soesseueseensscasssenssss WTTus 32+9 68+9 0) = 390 + 70 (360 + 40) 480 + 140 (1,320 + 60) 
Pol 
TA(5)-NPTerB 
DnaB Cocgeegeececoocooooooooccs WTTus 41+9 59+9 0) = 880 + 110 (860 + 90) 480 + 120 (1,550 + 110) 
Pol quetecsececcoscososcosoose 
Swapped F4n GC(6)-NPTerB 
DnaB 8 ggegggnccosocoosccooscce WTTus 2247 75+7 323 186 400 + 60 (310 + 70) ,260 + 100 (1,260 + 60) 
po] eromoscovesseosoosossosoes 
Swapped F5n GC(6)-NPTerB 
DnaB Ceegegesccccoocooooooonoss WTTus 19+6 70+8 11+5 180 + 26 400 + 60 (350 + 20) 320 + 100 (1,300 + 60) 
po] eromosacveesoosoosossessoe 
NPTerH 


The TerB site and its alterations are depicted in cartoon with the native nucleotides in black except that C6 is in yellow and substituted nucleotides are in magenta. The directionality of the replication fork is shown 
with the strands on which Pol III holoenzyme and DnaB translocate. The nucleotides in native TerH that differ from TerB are also shown in magenta. The sequences of oligonucleotides used to assemble the variants 
of TerB substrates are given in Extended Data Fig. 1b. 

Stop, bypass and restart events were quantified as a percentage of all events that reached or bypassed TerB (see Methods). Uncertainties correspond to standard deviations of binomial distributions. The mean of 
the rates and pause durations are shown either as their arithmetic averages or, in parentheses, by fitting their histograms with Gaussian or exponential decay distributions, respectively (see Fig. 2a, c and Methods). 
The uncertainties correspond to standard errors. Each experimental condition represents results from three or four technical replicates and the number of derived molecules (N) is specified in the corresponding 
figures. 

*Concentration of Tus(R198A) was 250 nM versus the standard 80 nM. 

+Reaction from which the restart proteins PriA, PriB and DnaT were omitted. 
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In contrast, when the fork approached the NP face (NPTerB), per- 
manent stoppage (9.7 + 1 min) of DNA synthesis occurred in 45% of 
trajectories, and restart in only 3% (Fig. 1d, e and Extended Data Fig. 
2a, b). The remaining 52% showed no sign even of transient stoppage. 
All TerB sites were Tus-bound under our experimental conditions 
(Fig. 1f), indicating they have an inherently low efficiency of fork 
arrest. We are thus able for the first time to distinguish between the 
two different mechanisms that could explain the in vivo data®”. 

Fork arrest is attenuated in vivo by DNA supercoiling, suggesting 
that it is affected by the rate of strand separation®. To test this pro- 
position, we separated the trajectories that showed full or transient 
stoppage from those that did not and found the rate of DNA synthesis 
and fork bypass were correlated (r = 0.62; Extended Data Fig. 4a); fast 
forks were arrested less often than slower ones. In fact, there was a 
twofold difference in average rates of synthesis at forks that stopped 
and those that bypassed TerB (Fig. 1g). DNA synthesis at individual 
forks before stoppage at TerB, or in full trajectories where they 
bypassed it, progressed at nearly constant rates under our spatial 
and temporal resolution (Extended Data Fig. 4b-e). As we showed 
previously’, the overall average rate reproduces the average in vivo 
rate (~950 bps _')'8. This underscores the significance of our ability to 
achieve the in vivo rate of DNA synthesis to reproduce the ~50% 
efficiency of fork arrest in vivo. 

The rate dependence of stoppage supports the hypothesis that 
strand separation competes with inefficient C6 flipping. To dem- 
onstrate this, we pre-formed the locked complex before replisome 
assembly using TerB with a mismatched bubble in place of base pairs 
3—7 while keeping an unpaired C6 (Fig. 1h and Table 1)°. The yield of 
fork arrest increased to 89%; thus, once the lock is established, it is a 
very effective fork block. 

We next interrogated the role of lock formation with C6-defective 
TerB mutants’. Surprisingly, the GC(6) to CG substitution did not lead 
to ~95% bypass. Instead, it resulted in transient (for 37 + 6s) rather 
than permanent blockage, again in ~50% of trajectories (Fig. 2a, b and 
Extended Data Fig. 2c). Moreover, the fork rate dependence of pausing 
was similar to the normal lock (Fig. 2c). DnaB remained at the fork 
during transient arrest since DNA synthesis could restart in the 
absence of helicase reloading proteins (Table 1). The crystal structure 
ofa Tus complex with a forked Ter containing an unpaired G replacing 
C6 showed that the substituted G6 base neither bound in the cytosine 
pocket nor formed any new specific interaction with Tus (Fig. 2d and 
Extended Data Table 1). This remained the case even when the fork 
was extended to also disrupt the TA(7) base pair (Extended Data 
Fig. 5a-c). Thus the fork-rate-dependent step producing transient 
stoppage must precede engagement of C6 in its binding pocket. 

In the Tus crystal structures, the «6/L3/a7 region has extensive 
interactions with the lagging strand (Fig. 1b) before and after C6 lock 
formation®"', providing a paradox about how the lagging-strand- 
translocating DnaB in fast-approaching replisomes disrupts these 
interactions without even pausing. The main sequence-specific contact 
this region makes with the first 6 bp of dsTer are via Arg198 in L3 with 
the A5 and G6 bases on the lagging strand and T5 on the leading strand 
(Extended Data Fig. 5e)'’, but these interactions are not present in the 
locked complex, where Arg198 makes a new salt bridge to the phos- 
phate between lagging strand nucleotides 6 and 7 (Extended Data Fig. 
5a)°. We suggest that the Arg198 side chain forms transient interac- 
tions with G6, the TA(5) base pair and the lagging-strand phosphate, 
holding the two DNA strands together before strand separation 
reaches GC(6) (Extended Data Fig. 5e). Moreover, comparison of 
the structures of Tus with the wild type’? and CG(6) mutant dsTer 
sites (Extended Data Table 1) suggests rearrangement of lagging strand 
interactions with Arg198 (Extended Data Fig. 5e, f). We propose that 
Arg198-DNA contacts rearrange substantially during strand separa- 
tion. This provides a window of opportunity for the fast-moving DnaB 
to break into the Tus-Ter central interactions before Argl98 
rearrangement or C6 base flipping occurs. 
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Figure 2 | Characterization of transient stoppage of the replication fork at 
the non-permissive face of Tus-TerB before C6 base flipping. 

a, Representative trajectories of restart of DNA synthesis after transient 
stoppage at a Tus-bound TerB site where the GC(6) base pair was swapped to 
CG(6) (CG(6)-NPTerB). The distribution of pause durations was fit with a 
single exponential decay; the fit line is shown and uncertainty corresponds to 
the standard error (N = 16). b, Percentages of the populations of replication 
forks that bypassed, transiently stopped or fully stopped at CG(6)-NPTerB. 
Error bars correspond to standard deviations of binomial distributions 

(N = 38). c, Rate dependence of replication restart at CG(6)-NPTerB. The rate 
distribution of leading-strand synthesis for events that bypassed (grey; N = 18) 
or stopped/restarted (blue bars; N = 20) at CG(6)-NPTerB are presented as in 
Fig. 1g. d, Crystal structure of Tus with a forked Ter sequence that has a 
substituted G base in the C6 position in the locked complex (see also Extended 
Data Fig. 5b). The G base, highlighted in blue, was neither docked into the 
cytosine-binding pocket nor forming any new interactions with Tus. 
Highlighted nucleotides (at bottom) were not visible in the structure. e, Fates of 
replication forks at Tus bound to the NP face of a TerB site containing a 
bubbled-DNA structure in place of base pairs 3—7 in TerB and with G replacing 
C6 (5-mismatch G6-NPTerB). Error bars correspond to standard deviations of 
binomial distributions (N = 26). 
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To test this proposition, we used a bubble substrate with altered C6 
(5-mismatch G6-NPTerB; see Table 1) to eliminate lock formation® 
but allow rearrangement of interactions on the separated strands 
before arrival of DnaB. We observed efficient transient stoppage that 
reached 77% with a long duration of 177 + 20 s (Fig. 2e; Extended Data 
Fig. 6). The fivefold increase in pause duration compared to CG(6)- 
NPTerB (Table 1) is probably owing to the interactions of the unpaired 
seventh nucleotide as in the locked complex structure’. Thus, strand 
separation beyond GC(6) in the absence of the C6 lock would impose 
only transient fork stoppage. 

We next altered A5 and T5 alone and in the context of the first 5 bp 
of TerB (Table 1); this resulted in the largest decrease in yield of 
stoppage and shift in the rate-dependence of arrest to lower values, 
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Figure 3 | Model of Tus- Ter polar arrest activity at the non-permissive face. 
Prior to strand separation, Arg198 makes base-specific contacts with A5 and G6 
on the lagging strand to protect Tus-Ter central interactions from the DnaB 
helicase. After separation of the first six base pairs, Arg198 maintains contacts 
with the lagging strand by rearranging its interactions to make a new salt bridge 
to the phosphate between A5 and G6 and a new unidentified base-specific 
interaction is induced with T5. Competition between rates of strand separation 
and rearrangement of Arg198 interactions determines Tus-Ter efficiency. 


underscoring that A5 and/or T5 are the primary contributors in this 
region to the rate dependence (Extended Data Fig. 7a-c and 
Supplementary Discussion 4). AT(5) is conserved in strong Ter sites 
but not at weaker ones like TerH’. We also found that the NP Tus- 
TerH complex stops the forks with a similarly low rate dependence to 
TA(5)-NPTerB (Table 1). However, one-third of the stopped forks 
restarted synthesis after a pause of 180 + 26s (Extended Data Fig. 
7g, h), which we attribute to other alterations in TerH that weaken its 
binding in its locked form”. 

Substitution of wild-type GC(6) with CG has a modest effect on Tus 
binding to dsTerB in comparison to an AT or TA? We observed that, 
relative to CG(6)-NPTerB, the TA substitution resulted in transient 
fork stoppage with decreased yield (Table 1 and see Fig. 2a—c; Extended 
Data Fig. 7d-f), demonstrating the importance of the specific inter- 
action of Tus with the native G6 for transient stoppage. 

We then altered Arg198 itself. The R198A mutant interacts with 
dsTerB with a 140-fold increased Kp, but only a twofold shorter life- 
time”. We showed by surface plasmon resonance (SPR) that 
Tus(R198A) can form a lock (Extended Data Fig. 8a—-d), but it was 
very defective in fork arrest (Table 1); stoppage was inefficient (18%) 
and transient (pauses of 14 + 4s; N = 4). Nevertheless, preforming the 
locked complex with R198A on the 5-mismatch C6-NPTerB substrate 
restored efficient stoppage (Table 1), consistent with lock formation. 
These results suggest that C6 flipping cannot occur unless Arg198 
interactions slow down or transiently stop the fork beforehand. 
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Faster moving forks have higher probability to separate GC(6) before 
rearrangement of Arg198 interactions, leading to effective displacement of Tus 
(probability A). The slower forks are either stopped permanently before or 
during GC(6) melting (probability B) or transiently if GC(6) is melted and 
Arg198 succeeds in rearranging its interactions (probability C, step 1). The C(6) 
mousetrap acts as a terminal step that is enabled by the transient stoppage to 
impose permanent fork arrest (probability C, step 2). 


So we have by now revealed two separate processes, one leading to a 
transient stoppage preceding but probably on the pathway to C6 lock 
formation, and one that leads directly to bypass and Tus dissociation. 
Previous results have suggested the operation of an uncharacterized 
C6-lock-independent arrest mechanism’. Our study shows that this 
mechanism must be invoked before or as GC(6) is melted, because 
permanent stoppage was not achieved when the GC(6) was melted in 
the absence of the C6 lock (Fig. 2e). To explore whether the interac- 
tions of Arg198 with AT(5) and G6 contribute directly to this alternate 
mechanism, we maintained these interactions using the TerB sequence 
and deactivated the C6 lock using Tus(H144A), the key residue in the 
binding pocket. This mutation completely eliminated lock formation 
(SPR in Extended Data Fig. 8e, f; X-ray structure in Extended Data 
Fig. 5d). However, we still observed a high level (27%) of permanent 
fork arrest, confirming existence of a lock-independent process leading 
to permanent stoppage. There were also significant restarts (18%; 
Table 1) after short pauses (33 + 5s; N = 6). Pausing must result from 
a mechanism additional to permanent arrest, since restarts would 
otherwise be randomly distributed over the full 10 min period of obser- 
vation. The rate-dependence of arrest was similar to wild-type Tus 
(Table 1 and Extended Data Fig. 9). 

Collectively, our results show that interactions of Arg198 of Tus with 
G6, A5 and/or T5 act to protect Tus-Ter central interactions from the 
first arriving DnaB. Nevertheless, these gatekeeping interactions are 
dynamic during separation of the first 6 bp and their rearrangement 
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occurs in competition with strand separation. We suggest that faster 
forks have higher probability to separate GC(6) before rearrangement 
of Arg198 interactions, displacing Tus without pausing (Fig. 3, probabil- 
ity A). Slower forks are either stopped permanently before GC(6) melt- 
ing (probability B) or transiently if GC(6) is melted and Arg198 succeeds 
in rearranging (probability C). The inefficient C6 mousetrap is a ter- 
minal step, enabled by transient stoppage to impose permanent fork 
arrest (probability C). These results provide an explanation of why, in 
helicase assays, the slowly moving DnaB (35—390 bps‘)??? is effi- 
ciently stopped at the NP face without requiring C6 flipping”’. 

Thus, we refine the mousetrap model and redefine the efficiency of 
Tus-Ter polar arrest to depend on collective contributions of intrinsic 
affinity of Tus for Ter, stability of the flipped C6 in its binding pocket, 
and rate-dependent induction of fork stoppage that fully or temporarily 
protects Tus-Ter central interactions from DnaB. Our observations 
also raise a question about how weaker Ter sites evolved to block slower 
forks (Supplementary Discussion 5). The encounter of dsDNA-binding 
proteins with motor proteins like helicases and polymerases is a com- 
mon feature in replication, repair, recombination and transcription 
and where conflict among these processes arises (reviewed in ref. 24). 
We show for the first time that intrinsic heterogeneity in rates of 
individual molecular motors can have different biological outcomes 
as they communicate with dsDNA-binding proteins and other barriers 
(Supplementary Discussion 6). 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size, the experiments 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

Protein expression and purification. Described methods were used to prepare 
N-terminally Hisg-tagged Tus”? and its mutant derivatives Tus(R198A)*° and 
Tus(H144A)°, as well as the following E. coli DNA replication proteins: the B, sliding 
clamp”, the Pol III 1358’y clamp loader and a¢9 core’’, the DnaBg(DnaC)¢ heli- 
case/loader complex"* and the fork restart proteins PriA, PriB and DnaT”. 
Crystallization of Tus- Ter complexes and data collection. Four crystal struc- 
tures of Tus-Ter complexes are reported (Extended Data Table 1; the oligonucleo- 
tide sequences and proteins used are given in Extended Data Fig. 5). All complexes 
(finally at 4-5 mg ml~' protein) were prepared with a slight excess of DNA in 
10 mM Bis-Tris, pH 6.5, 1mM EDTA, 2 mM dithiothreitol, and excess DNA was 
removed by using a centrifugal ultrafiltration device, as described previously*. 
Crystals were grown using the vapour diffusion (hanging drop) method at 23 °C. 
The protein-DNA complex (3 pl) was mixed with an optimized reservoir solution 
(3 ul) consisting of 8-12% PEG 3350, 0.1-0.2 M Nal, 50 mM Bis-Tris, pH 6.2-6.8. 
Crystals appeared after 2 days and reached maximum size after 10 days. The pH 
was measured using 1 M stock solutions of buffer before addition to the reservoir. 
Optimized reservoir solutions for the four crystals contained: for Tus-UGLT fork 
(forked Ter with C6 to G change), 12% PEG 3350, 0.2 M Nal, 50 mM Bis-Tris pH 
6.2; for Tus-TGTA fork (forked Ter with C6 to G change and the fork extended to 
position 7), 9% PEG 3350, 0.2 M Nal, 50 mM Bis-Tris pH 6.8; for Tus(H144A)- 
WT fork (‘wild-type’ forked Ter with Tus(H144A)), 8% PEG 3350, 0.1 M Nal, 
50mM Bis-Tris pH 6.2; for Tus-UGLC (dsTerA with GC(6) to CG flip), 12% 
PEG 3350, 0.2 M Nal, 50 mM Bis-Tris pH 6.5. 

All X-ray data were collected at the Australian Synchrotron beamline MX-1 (X- 
ray wavelength, 0.95370 A) using an Oxford cryostream to maintain the crystal 
temperature at 100K. Prior to cooling, crystals were transferred stepwise into 
artificial mother liquors finally containing 15% (v/v) MPD (2-methyl-2,4-penta- 
nediol) in 3% increments of MPD (3 min per step). Data were collected using an 
ADSC Quantum 210r area detector, using BLU-ICE for remote data acquisition 
and processing”. Data reduction and scaling was achieved with the HKL2000 
package**. 

Structure determination and refinement. All structures were solved by molecu- 
lar replacement in MOLREP” using a previously solved Tus-Ter lock (PDB code: 
2106) or Tus-TerA structure (2105) as starting model. REFMAC***! was used for 
structure refinement and calculation of map weighting factors. COOT” was used 
to interpret electron density maps and for model building. Figures were prepared 
using PyMOL*. 

Assessment of Tus— TerB interactions by surface plasmon resonance (SPR). 
Methods were essentially as used previously*”®, except that all experiments were 
carried out at 20 °C (instead of 25 °C) anda6 X 6 multiplex BioRad ProteOn XPR- 
36 system was used instead of a Biacore 2000 instrument; dissociation rate con- 
stants (kg) of Tus proteins from immobilized biotinylated TerB showed an unusual 
temperature dependence (high activation energy), which accounts for lower values 
of kg and the dissociation constant (Kp) compared to previously reported values 
(where they are available*”®). 

All measurements used SPR buffer (50 mM Tris pH 7.6, 250 mM KCl, 0.25 mM 
EDTA, 0.5 mM dithiothreitol, 0.005% surfactant P29), with a ProteOn NLC (neu- 
travidin-coated) sensor chip for immobilization of 5’-biotinylated TerB oligodeox- 
yribonucleotides (oligos). These were either (1) 5’-bio-(pD),>-ATAAGTATGT 
TGTAACTAAAG, oligo-1, or (2) 5’-bio-(pD);y>-GGGGCTATGTTGTAACTA 
AAG, oligo-2, each containing a 10-unit abasic deoxyribosephosphate spacer 
(pD)i9 to move the TerB molecule away from the chip surface”, as well as a 
common lagging strand TerB sequence (underlined)”. Hybridization of oligo-3: 
5'-CTTTAGTTACAACATACTTAT (C6 of Ter in bold) to oligo-1 produces a full 
dsTerB site, while its hybridization to oligo-2 produces a forked Ter where C6 is 
unpaired and exposed (mismatched sequences in oligos-2 and -3 are in italics)®. 

All 36 interaction spots of the sensor chip were activated with three sequential 
injections of 1M NaCl, 50mM NaOH across six vertical (ligand) flow paths 
(40s each at 40 ul min ') and six horizontal (analyte) flow paths (40s each at 
100 pl min~'). The surface was further stabilized by two injections of 1 M MgCl in 
each direction, with the same contact times and flow rates. Oligos-1 and -2 were 
diluted to 200 nM in SPR buffer and immobilized separately onto the six inter- 
action spots of the vertical flow path (100 pl min“! for 15s). The chip was then 
rotated 90° and simultaneous assembly of dsTerB and forked Ter templates® on the 
chip surface was achieved by hybridization of oligo-3 (300 nM), made to flow 
across all six horizontal (analyte) channels at 25 pl min | for 400s. The sensor- 
gram verified that hybridization went to completion. After subsequent injection of 
a concentration series of Tus, the surface was regenerated; remaining proteins and 


LETTER 


hybridized DNAs were removed by two injections of 1 M NaCl, 50 mM NaOH 
over the six analyte channels at 50 tl min! for 40s, followed by re-hybridization 
of oligo-3 as above. Measured stoichiometries of Tus binding to both templates 
were close to 1:1 at saturation, as reported previously*””. 

Tus, Tus(R198A) and Tus(H144A) interactions with TerB and forked Ter 
templates were carried out by sequential injections in the analyte direction of 
one or two appropriate concentration series in SPR buffer (zero and five concen- 
trations of serially diluted samples) at 40 pl min’ for 300s, followed by dissoci- 
ation in the same buffer over 2,000 s. The final sensorgrams were interspot and 
unmodified ligand flow path subtracted using ProteOn Manager Software 
(v. 3.1.0.6) and then zero subtracted and normalized based on the highest response 
of hybridized oligo within the discrete ligand flow path using BlAevaluation 
software (v. 4.0.1; Biacore AB, Sweden). Equilibrium (dissociation constant, Kp) 
and kinetic (rate constants, k, and kg) parameters for the binding of Tus proteins to 
the Ter fragments were determined by global (simultaneous) fitting of at least five 
sensorgrams per measured interaction from the optimized concentration range 
using BlAevaluation software and the appropriate interaction model(s): 
(Langmuir) 1:1 binding with mass transfer model (LMT, for Tus-dsTerB inter- 
action; as previously done”), (Langmuir) 1:1 binding model (L, for Tus— and 
Tus(R198A)—forked Ter), and 1:1 steady state affinity (LSS) and heterogeneous 
ligand-parallel reactions (HLPR) binding models for fitting sensorgrams that 
reached an equilibrium response (Tus(R198A)—dsTerB, Tus(H144A)—dsTerB 
and Tus(H144A)—forked Ter). 

Global best fits were used when LSS and HLPR models were used. When L and 
LMT models were used, the fitting was constrained by setting the Rnax to a global 
constant value (response at saturation of ligand binding sites was set to 700 
response units (RU) for bindings to dsTerB and 775 RU for forked Ter). These 
values, calculated theoretically as a product of the highest measured response of 
hybridized oligo-3 (molecular weight 6,354; used as a normalization unit) onto 
oligo-1 (120 RU) and oligo-2 (134 RU) and the factor 5.8 (molecular weight of 
Hiss-Tus/molecular weight of hybridized oligo-3 = 36,737/6,354), were com- 
pared with experimentally determined values obtained by flowing Tus at a sat- 
urating concentration (1.024 1M) over the two DNA templates (not shown). In 
addition, due to slow dissociation, experimentally determined kg values for Tus- 
and Tus(R198A)-forked Ter interactions using the L model were assessed by 
comparison with the kg values determined from the experiment where dissoci- 
ation was monitored over 50,000 s (not shown). To generate as reliable as possible 
values for kinetic parameters using the HLPR model, k, and kq were estimated in 
the first approximation based on complete association phase and only an initial 
phase of dissociation where the rate of change is the greatest. These obtained values 
of kinetic parameters were sometimes used as initial iterative values; otherwise, the 
iterations could slip into local minima without reaching a sensible solution. Only 
the fit kinetic parameters of the prevalent (dominant) reaction using HLPR model 
were finally presented in Extended Data Fig. 8g. For assessment, Kp values calcu- 
lated from obtained kinetic parameters (kg/k,) were compared with Kp values 
directly obtained using the LSS model (Extended Data Fig. 8g). 

Single molecule flow stretching assays: DNA substrate constructs. Bacteriophage 
2. DNA was modified by ligating a biotinylated fork on one end and a digoxigenin 
moiety at the other end as described previously'*. This ligated product was 
digested with either EcoRI or Apal to generate 3.6- and 10.1-kb fragments from 
the forked and digoxigenin ends, respectively. An oligonucleotide sequence con- 
taining a single copy of wild type or variants of the TerB site was ligated to the 
digested ends of the 3.6- and 10-kb fragments as described previously** to gen- 
erate DNA constructs with variant TerB sites that are listed in Table 1 and 
Extended Data Fig. 1b. 

Force calibration. A force extension curve was constructed by measuring the 
length of individual 13.7 kb DNA molecules and calculating the hydrodynamic 
drag force at different flow rates using the equipartition theorem equation as 
described previously***°. The force extension curve was fit using Worm-like chain 
model****. The fluctuation in the laminar flow causes an error in estimating the 
force and consequently the length of individual DNA molecules, which results in 
an error in estimating the location of the TerB site relative to the fork. At the 
applied stretching force of 2.6pN in our experiments, the error in estimating 
the force, derived from the standard deviation among seven DNA molecules in 
the same field of view, results in an error of + ~85 bp in estimating the position 
of the TerB site at 3.6 kb from the site of fork assembly**. Consequently, we treated 
any replication event ending between 3.5 and 3.7 kb as being stopped at the TerB site. 
Single-molecule leading-strand synthesis assay. The leading strand DNA syn- 
thesis and data analysis were performed as described previously'*"* with the 
variation of adding Tus to the reaction. Briefly, Tus was first introduced under 
continuous flow at 80nM in buffer containing 30mM Tris-HCl pH 7.6, 
50 mM NaCl, 0.5mM EDTA, 5 mM dithiothreitol and 10 mM MgCl, for 30 min 
to ensure the binding of Tus to TerB. The excess DNA-unbound Tus was removed 
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by washing with 15 times flow cell volume with replication buffer containing 
50mM HEPES-KOH pH 7.9, 80mMKCI, 12mMMg(OAc)., 2mM MgCh, 
5 mM dithiothreitol and 0.1 mgml’ BSA. Tus was then reintroduced with the 
replication proteins under continuous flow in the replication buffer supplemented 
with 760 [1M of each dNTP, 1 mM ATP and proteins as follows: 80 nM Tus, 30 nM 
1366’x, 30 nM DnaB,(DnaC), helicase-loader complex, 30 nM 8, clamp, 60 nM 
a0 core Pol III, and fork restart proteins, 20nM PriA, 40nM PriB and 480nM 
DnaT. Experiments were carried out at 32 °C. 

For data analysis, the picked particles were first corrected for their Brownian 
motion using unreplicated tethered DNA molecules. Pausing of DNA synthesis 
was considered when the amplitude fluctuations of a minimum of six data points 
(acquisition rate was 2 Hz) was less than three times the standard deviation of the 
noise. Bead displacement was converted into numbers of nucleotides synthesized 
using the known length difference between ss- and dsDNA in A DNA® (3.76 bases 
per nm at our applied stretching force of 2.6 pN). Total experimental time was 30 
min. In the study of the effect of Tus concentration on Tus-TerB polar arrest 
activity (Fig. 1f), Tus was first pre-incubated with the DNA at 80 nM and excess 
Tus was washed out as described above for our standard experimental condition. 
This was followed by the introduction of either 20 or 80nM of Tus with the 
replication proteins. Tus(H144A) was used at concentration of 80nM while 
Tus(R198A) was used at 250nM throughout the reaction. Multiplexed single- 
molecule experimental results were derived from three or four technical replicates 
for each experimental condition. 

The portion of leading strand synthesis trajectories that randomly terminated 
before reaching the position of the Ter site at 3.6 + 0.1 kb (Extended Data Fig. 3a) 
were excluded from analysis. Those that reached the Ter site were separated into 
three categories: (1) those that continued unimpeded through the Ter site 
(‘bypass’), (2) those that were ‘permanently’ arrested for all of the period of 


observation (9.7 + 1.0 min; ‘stop’), and (3) those that paused (for = 3 s, see above) 
and then resumed (‘restart’). 
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TA(5)-NPTerB 


5’ -AACGCGCCATAATATTAACC! TTATAGCACAGTCGTGGTGACTTG - 3’ 
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NP 


Extended Data Figure 1 | Setup for leading-strand replication assays. a, A 
schematic representation of the 13.7 kb DNA substrate construct. The substrate 
contains a biotinylated fork at one end to attach it to the streptavidin-coated 
glass coverslip and a digoxigenin moiety at the other end to attach it to a 2.8 um 
diameter anti-digoxigenin-coated paramagnetic bead. A single insert of TerB 
site is located at 3.6 kb from the biotinylated fork. b, Oligonucleotides used to 
assemble wild-type and variants of TerB substrates for their ligation to the 
3.6 kb EcoRI and 10.1 kb Apal A DNA fragments*. Native TerB residues are 


lagging (<DnaB) 


highlighted in yellow except C6 that is in red. Non-native (modified) residues in 
TerB are highlighted in grey. Native TerH residues are highlighted in orange. 
Leading and lagging DNA strands as well as permissive (P) and non-permissive 
(NP) faces of Ter when bound to Tus are denoted. Directionality of 
translocation of DnaB that encircles the lagging strand as it unwinds dsDNA 
during leading strand DNA synthesis by Pol III holoenzyme is denoted by 
arrows. 
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Extended Data Figure 2 | Examples of trajectories for leading-strand 
synthesis upon encountering Tus bound to non-permissive Ter sites. The 
location of the TerB site at 3.6 + 0.1 kb is indicated by the dashed lines. The 
rates of leading strand synthesis were calculated by fitting the slopes of the 
trajectories by linear regression using a least-squares approach. The replisomes 
displayed heterogeneity in rates of DNA synthesis. a, Trajectories where forks 
stopped at the NPTerB site. The average stoppage time captured within our 
acquisition time was 9.7 + 1 min (uncertainty is the standard error) as 
illustrated for the top trajectory. b, Trajectories where forks displaced Tus and 
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bypassed the NPTerB site without displaying any transient stoppage. c, Fate of 
the replication fork upon encountering CG(6)-NPTerB. Examples of 
trajectories for leading-strand synthesis upon encountering Tus bound to 
CG(6)-NPTerB showing transient stoppage at CG(6)-NPTerB, followed by 
resumption of DNA synthesis; 56% of the restarted events displayed DNA 
synthesis with disrupted behaviour (top row) while 44% showed normal 
behaviour (bottom row). We attributed the disrupted restart of DNA synthesis 
in some of the trajectories to the replisome losing some components other than 
DnaB during stalling. 
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Extended Data Figure 3 | Effect of TerB site alone and nonspecifically DNA- 
bound Tus on DNA synthesis. a, Probability of termination of DNA synthesis 
at 0.2 kb intervals (spatial resolution of the assay) along the 13.7 kb NPTerB in 
the absence of Tus, showing stops at TerB (3.5-3.7 kb, denoted by black arrow) 
occur randomly with a 3% probability when all events were considered, in 
contrast to 5% when only events that reached TerB (=3.5 kb) were taken into 
account. b, Processivity of DNA synthesis on the NPTerB substrate in the 
absence of Tus. The processivity distribution is fit with an exponential decay 
(N = 88) and uncertainty corresponds to the standard error, illustrating the 
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random stoppage behaviour of the replisome during synthesis. c, Rate of 
leading strand synthesis using the 13.7 kb force-calibrated DNA construct 
(NPTerB in this case) in the absence (left panel; N = 94) or presence of Tus 
(right panel; N = 69). The rate distributions were fit with a Gaussian 
distribution. The fit lines are shown and the uncertainties correspond to the 
standard error. The rate agrees with our previously reported rate using force- 
calibrated 4 DNA constructs", demonstrating the accurate force calibration of 
the 13.7 kb substrate. 
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Extended Data Figure 4 | Linear fitting of the rate of leading-strand 
synthesis is appropriate for deriving the correlation between rate of DNA 
synthesis and stalling activity at the NPTerB site. a, Rate dependence of fork 
arrest at NPTerB. A scatter plot of forks that stopped (N = 31) or bypassed 
(N = 32) Tus bound to NPTerB; rates were calculated by fitting the DNA 
shortening phase of the entire trajectory in cases of events that bypassed and 
up to the stoppage point in events that stopped/restarted (histograms are shown 
in Fig. 1g). A significant correlation between fork progression rate and fork 
bypass at NPTerB is observed using a one sided Pearson’s correlation test at 
the 0.05 level of significance (the calculated correlation coefficient (r) was 0.62). 
The Pearson’s correlation coefficient was calculated using the equation 


din %i—X)(Vi—J) 


r= . b, Scatter plot (left) and rate 
VidhiG@i-97] [0101-7 
distributions (right) of leading-strand synthesis for events that bypassed (grey 
bars) (N = 32) or stopped/restarted (blue bars) (N = 31) at NPTerB when 
the rate was estimated from fitting the slope of the three data points before the 
TerB site (acquisition time is 0.5 s per data point). The rates were fit with a 
Gaussian distribution and uncertainty corresponds to the standard error. The 
calculated average rates for events that bypassed or stopped/restarted at 
NPTerB are similar to those calculated when the rates were fit using the DNA 
shortening phase of the entire trajectory in cases of events that bypassed and 
up to the stoppage point in events that stopped/restarted (shown in Fig. 1g 
in the main text), underscoring the suitability of linear fitting of the rate. 


Time (s) 


Furthermore 7” from linear regression fits was 0.95 + 0.05. ¢, The correlation 
between apparent fluctuation in rate of DNA synthesis within individual DNA 
molecules and their corresponding Brownian motion (N = 23). d, The individual 
trajectories displayed apparent fluctuation in rate of DNA synthesis as illustrated 
in a representative trajectory where we zoomed in at the DNA shortening phase 
and fit the rate linearly to intervals of three consecutive data points. The 
percentage of apparent fluctuation in rate of DNA synthesis within individual 
DNA molecules was calculated by dividing the standard deviation of the average 
of interval rates over the average rate. The standard deviation of the average 

of Brownian motion of each individual DNA molecule was calculated from the 
fluctuation of the DNA before and after being replicated. The percentage of 
apparent fluctuation in rate of individual DNA molecules displayed a strong 
positive correlation with their corresponding Brownian motion when analysed 
by two-sided Pearson’s correlation test at the 0.05 level of significance (r = 0.81, 
panel c). e, The correlation between the percentage of apparent fluctuation in 
rate and the average rate of individual molecules. The percentage of apparent 
fluctuation in rate of individual molecules was calculated as described in d 

and for the same 23 replisomes. There was no correlation between the average 
rate of individual DNA molecules and their corresponding percentage of 
apparent fluctuation in rate; the Pearson’s correlation coefficient was -0.18. 
The results from c-e demonstrate that one strong factor behind the apparent 
fluctuation in rates within our individual 13.7 kb molecules under our spatial 
and temporal resolution is the Brownian motion of the DNA and that this 
apparent fluctuation in rate does not bias the estimates of speed of the replisomes. 
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Extended Data Figure 5 | Crystal structures of Tus complexes with Ter 
oligonucleotides. The sequences of oligonucleotides used for each complex are 
shown at the bottom of each panel; nucleotides for which electron density 
could not be interpreted are highlighted. a~-d, Complexes of Tus proteins with 
forked Ter sites. The C6-binding pocket is shown in the circle, with key 
residues Ile79, Phe140 and His144 in the binding pocket, and Arg198 shown in 
stick form. a, The wild-type Tus—Ter lock (PDB code: 2106), with C6 located in 
the binding pocket, and the TA(7) base pair melted. Arg198 is positioned to 
interact with the 5'-phosphate of T7. b, Complex of wild-type Tus with a forked 
oligonucleotide that has C6 substituted by a mispaired G (UGLT: upper G, 
lower T; PDB code: 4XR0); G6 does not occupy the pocket nor does it make any 
new specific interactions with Tus, and Arg198 no longer interacts with the 
5'-phosphate of T7. c, Further extension of the mismatched region in b to 
include A7 (TGTA: mispaired TGTA on the lower strand; PDB code: 4XR1) 
does not enable G6 to occupy the C6-binding pocket or form any new specific 
interactions. d, Tus(H144A) in complex with the normal Tus-Ter lock 
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oligonucleotide (PDB code: 4XR2), showing the mispaired C6 does not occupy 
the cytosine-binding pocket or form any new interactions with Tus. 

e, f, Potential interactions of Arg198 in crystal structures of Tus complexes with 
fully base-paired Ter oligonucleotides. Only nucleotides in base pairs 5 and 6 
are shown, and they are colour-coded to match the stick representations of 
them in the figures. Arg198 is shown in yellow stick representation. e, Structure 
of the wild-type Tus-TerA (GC(6)) complex (PDB code: 2105). Arg198 is 
positioned potentially to make H-bonding interactions with the A5, G6 and T5 
bases and the deoxyribose ring oxygen of G6, as well as electrostatic interactions 
with the 5'-phosphate of A5, as suggested previously'’ and demonstrated by 
molecular dynamics simulations (A.J.O., unpublished observations). 

f, Structure of the complex with a GC(6)-flipped version of the TerA 
oligonucleotide (UGLC: upper G, lower C; PDB code: 4XR3) showing an 
alternate major conformation of the Arg198 side-chain that has lost all base- 
specific interactions; only the interaction with the sugar ring oxygen of the 
substituted C6 is maintained. 
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Extended Data Figure 6 | Fate of the replication fork upon encountering 
5-mismatch G(6)-NPTerB. a, Examples of trajectories of replication forks that 
transiently stopped at Tus bound to the bubble template with C6 switched to G6 
(5-mismatch G6-NPTerB). b, The distribution of the pause durations fit with a 
single exponential decay. The fit line is shown in black and the uncertainty 
corresponds to the standard error (N = 20). 
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Extended Data Figure 7 | Fate of the replication fork upon encountering 
NPTerB sites with swapped sequences in the first five base pairs, TA(6)- 
NPTerB and NPTerH. Rate dependence of replication fork arrest at Tus 
bound to: a, TA(5)-NPTerB (N = 25); b, swapped F4n GC(6)-NPTerB 

(N = 29); c, swapped F5n GC(6)-NPTerB (N = 36). The rate distributions of 
leading-strand synthesis for events that bypassed (grey bars) or stopped/ 
restarted (blue bars) at these sequences. d, Examples of trajectories of leading- 
strand synthesis that transiently stopped at Tus bound to TA(6)-NPTerB. 75% 
of the restarted events displayed DNA synthesis of normal behaviour (left 
traces) while 25% showed disrupted behaviour (right trace). e, The distribution 
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of the pause durations at TA(6)-NPTerB fit with a single exponential decay 
(N = 8). f, The rate distribution of events that bypassed (N = 30; grey bars) or 
stopped/restarted (N = 11; blue bars) at TA(6)-NPTerB. g, Examples of 
trajectories of leading-strand synthesis that transiently stopped at Tus bound to 
NPTerH. The average pause duration was 180 + 26s (N = 4). The uncertainty 
is the standard error. h, The rate distribution of leading-strand synthesis for 
events that bypassed (N = 26; grey bars) or stopped/restarted (N = 11; blue 
bars) at NPTerH. The histograms in a-c, e, f and h were fit to Gaussian 
distributions, the fit lines are shown, and the uncertainties correspond to the 
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Extended Data Figure 8 | SPR assessment of Tus- TerB interactions: 
whereas Tus and Tus(R198A) are capable of forming a lock, Tus(H144A) is 
not. ProteOn sensorgrams show association and dissociation phases of 
Tus—TerB interactions at ranges of Tus concentrations (as specified in g) of 
serially-diluted samples of Tus proteins. Curves, shown in colours, were fit 
simultaneously (black curves) to various binding models (see Methods). 

a, Wild-type Tus and dsTerB. Considering that the k, >1 X 10°M™'s 
suggests significant mass transport limitations, the LMT model was used to fit 
the data with Ry, constrained to 700 RU. The derived kinetic parameters were 
used to simulate sensorgrams devoid of mass transfer limitation using the L 
model (inset). b, Wild-type Tus—forked TerB interaction; Rinax Was 
constrained to 775 RU. The fit kg is in good agreement with the value of (5.20 + 
0.00) X 10~-°s~' obtained from an independent experiment where dissociation 
was monitored over 50,000 s (not shown). c, Tus(R198A)—dsTerB interaction. 
Binding kinetics parameters were obtained using the HLPR model. The sum 
of fit Ryrax1 (543 = 9) and Ryrax2 (54 + 5 RU) values were in reasonable 
agreement with the expected value of ~700 RU. Only the relevant k, and kg 
values of the predominant (based on R,yax1) interaction are presented in g. 
For assessment of the fitting procedure, responses at equilibrium were fit using 
the L model (inset). The derived Kp was within the factor of two of the 
calculated Kp obtained from kinetic parameters (kg/k,). The Rmax Value of 
816 + 32 RU was slightly higher than theoretical (700 RUs), probably owing 
to some non-specific binding in the high range of Tus concentration. d, 
Tus(R198A)—forked TerB interaction. The L model was used to fit the 


data with R,,,, constrained to 775 RU. The fit kg was within a factor of 

two of the value, (5.70 + 0.00) X 10 °s~', derived from an independent 
experiment where dissociation was monitored over 50,000 s (not shown). e, 
Tus(H144A)—dsTerB interaction. Binding kinetic parameters were obtained 
using the HLPR model. The sum of fit Rmmaxi (537 = 1) and Rmax2 (31 + 0 RU) 
values were in reasonable agreement with the expected value of ~700 RU. Only 
the relevant k, and kg values of the predominant interaction (Rmaxi) are 
presented in g. For assessment of the fitting procedure, responses at equilibrium 
were fit using the L model (inset). The derived Kp was within a factor of 

1.5 of Kp obtained from the kinetic parameters. In addition, the fit R,,a, value 
of 621 + 10 RU compares reasonably to the expected value of ~700 RU. f, 
Tus(H144A)—forked TerB interaction. Binding kinetics parameters were 
obtained using the HLPR model. The sum of fit Rinaxi (879 + 4) and Rinax2 
(65 + 1) values were somewhat high compared to the expected value of 
~775 RU. Only the relevant k, and kg values of the predominant reaction are 
presented in g. Responses at equilibrium were fit using the L model (inset). 
Derived Kp was within the factor of 2 of the calculated Kp obtained from 
(kg/k,). In addition, fit Rmax Value of 1,040 + 50 RU was slightly higher than 
theoretical. g, Summary of binding parameters for Tus-Ter interactions. 

All uncertainties are standard errors in parameters from fitting of complete 
data sets to appropriate binding models as described in the Methods. Data 
are representative of those from two technical replicates using different 
instruments (BiaCore T200 and ProteOn XPR-36). 
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Extended Data Figure 9 | Fate of the replication fork upon encountering 
Tus(H144A) bound to NPTerB. Rate dependence of fork arrest. The rate 
distribution of leading-strand synthesis for events that bypassed (N = 18; grey 
bars) or stopped/restarted (N = 15; blue bars) at NPTerB fit with Gaussian 
distributions. The fit lines are shown and the uncertainties correspond to the 
standard error. 
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Extended Data Table 1 


Data collection 
Space group 
Cell dimensions 

a, b, c (A) 

a, B, y (*) 
Resolution (A) 
Rsym 
Hol 
Completeness (%) 
Redundancy 


Refinement 
Resolution (A) 
No. reflections 
Ruork! Riree 
No. atoms 
Protein 
Nucleic acid 
Ligand/ion 
Water 
B-factors 
Protein 
Nucleic acid 
Ligand/ion 
Water 
R.m.s deviations 


Bond lengths (A) 
Bond angles (°) 


Asingle crystal was used in each case. 


wt Tus / forked Ter-UGLT 
[C6 to G mutant] 


P4,2,2 


64.5, 64.5, 248.3 
90, 90, 90 

75-2.80 (2.90-2.80) 
11.1 (74.0) 

39.5 (6.6) 

99.8 (99.6) 

22.1 (18.9) 


45-2.80 (2.86-2.80) 
12,822 (901) 
21.7 129.6 


2,504 
577 
13 

31 


25.3 
26.5 
28.7 
35.7 


0.016 
2.16 


Numbers in parentheses refer to the highest resolution bin. 


wt Tus / forked Ter-TGTA 
[C6 to G, T7 to A mutant] 


P4,2,2 


64.8, 64.8, 246.7 
90, 90, 90 

75-2.40 (2.50-2.40) 
12.0 (71.1) 

24.0 (3.3) 

99.9 (100) 

9.7 (10.8) 


63-2.40 (2.46-2.40) 
20,288 (1,521) 
19.8 / 26.9 


2,530 
574 
14 
96 


43.3 
45.1 
50.3 
57.2 


0.017 
2.13 


Data collection and refinement statistics for Tus-Ter complexes 


Tus(H144A) / wt forked Ter 


P4,2,2 


64.5, 64.5, 250.9 
90, 90, 90 

75-2.35 (2.43-2.35) 
8.0 (85.6) 

26.7 (2.0) 

99.6 (100) 

6.6 (7.5) 


62-2.35 (2.41-2.35) 
21,819 (1,658) 
22.0 / 26.6 


2,509 
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14 
69 


39.3 
38.4 
58.1 
48.1 


0.013 
1.71 
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wt Tus / dsTer-UGLC 
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75-2.70 (2.80-2.70) 
10.6 (84.1) 

27.2 (3.6) 
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Integrator mediates the biogenesis of enhancer RNAs 


Fan Lai!*, Alessandro Gardini!*, Anda Zhang" & Ramin Shiekhattar! 


Integrator is a multi-subunit complex stably associated with 
the carboxy-terminal domain (CTD) of RNA polymerase II 
(RNAPII)'. Integrator is endowed with a core catalytic RNA endo- 
nuclease activity, which is required for the 3’-end processing 
of non-polyadenylated, RNAPII-dependent, uridylate-rich, small 
nuclear RNA genes’. Here we examine the requirement of Integrator 
in the biogenesis of transcripts derived from distal regulatory 
elements (enhancers) involved in tissue- and temporal-specific regu- 
lation of gene expression in metazoans”>. Integrator is recruited to 
enhancers and super-enhancers in a stimulus-dependent manner. 
Functional depletion of Integrator subunits diminishes the signal- 
dependent induction of enhancer RNAs (eRNAs) and abrogates 
stimulus-induced enhancer-promoter chromatin looping. Global 
nuclear run-on and RNAPII profiling reveals a role for Integrator 
in 3’-end cleavage of eRNA primary transcripts leading to transcrip- 
tional termination. In the absence of Integrator, eRNAs remain 
bound to RNAPII and their primary transcripts accumulate. 
Notably, the induction of eRNAs and gene expression responsive- 
ness requires the catalytic activity of Integrator complex. We 
propose a role for Integrator in biogenesis of eRNAs and enhancer 
function in metazoans. 

To assess the role for Integrator in the biogenesis of eRNAs, we 
examined the signal-dependent recruitment of Integrator complex to 
enhancer sites. HeLa cells were starved of serum for 48 h, after which 
they were stimulated with epidermal growth factor (EGF) to induce 
immediate early genes (IEGs). We identified 2,029 enhancers based on 
their occupancy by RNAPII, CBP/p300 and containing acetylated his- 
tone H3 lysine 27 (H3K27ac) chromatin modification (see Methods). 
We found that while assessing steady-state levels of eRNAs provided 
a measure of EGF-induced eRNAs, we obtained a better read-out of 
eRNAs after sequencing of the chromatin-enriched RNA fractions 
(ChromRNA-seq)°. We focused on 91 enhancers that displayed EGF- 
induced eRNAs in the proximity of EGF-responsive genes following 
20min of induction (Extended Data Fig. 1, Supplementary Table 1 
and see Methods). Notably, the chromatin surrounding these enhancers 
displayed the H3K27ac modification in starved cells, and following EGF 
stimulation there was a small increase in H3K27ac levels (Extended Data 
Fig. 1b). To assess the polyadenylation state of eRNAs, total RNA was 
enriched for polyadenylated and non-polyadenylated fractions and was 
subjected to high-throughput sequencing. Similar to previous reports, 
EGF-induced enhancers displayed bi-directional eRNAs that were pre- 
dominantly not polyadenylated (Extended Data Fig. 2)°”. 

We next analysed Integrator occupancy at these enhancers by using 
antibodies against the INTS11 subunit of the Integrator complex 
before and after EGF stimulation. While these enhancers were occu- 
pied by a detectable amount of Integrator before EGF induction, addi- 
tion of EGF resulted in a further recruitment of Integrator complex 
(Fig. la-c). RNAPII displayed a similar pattern of stimulus-dependent 
chromatin residence (Fig. 1d, e). The stimulus-dependent recruitment 
of Integrator at enhancers was further confirmed using two additional 
antibodies against INTS1 and INTS9 subunits of the Integrator 
complex (Extended Data Fig. 3a). These results demonstrated the 


stimulus-dependent recruitment of the Integrator complex at EGF- 
responsive enhancers. 

To examine the functional importance of Integrator at enhancers 
and its role in the biogenesis of eRNAs, we developed HeLa clones 
expressing doxycycline-inducible short hairpin RNAs (shRNAs) 
against INTS11 and INTS1 subunits of the Integrator complex 
(Extended Data Fig. 3b). Within the time course of these experiments 
the mature levels of small nuclear RNAs (snRNAs) were not perturbed 
(data not shown). Twenty minutes of EGF stimulation resulted in the 
induction of bi-directional eRNAs similar to previous reports (Fig. 1a, f 
and Extended Data Fig. 1c-h)°*"*. Depletion of INTS11 diminished 
the eRNA induction after EGF stimulation (Fig. 1f; as shown at two 
enhancer loci; enhancers were named after their proximity to an EGF- 
responsive gene). The fold induction of eRNAs at all EGF-induced 
enhancers decreased significantly (Fig. 1g, h). We also observed a 
significant decrease in the transcriptional induction of EGF-responsive 
protein-coding genes in the proximity of these EGF-induced enhan- 
cers (Fig. 1g, h). Interestingly, there was a subtle increase (statistically 
not significant) in H3K27 acetylation at enhancers following EGF 
stimulation, which was reduced after Integrator depletion (Fig. 1f 
and Extended Data Fig. 3c). 

To gain further insight into quantitative changes in eRNAs follow- 
ing depletion of Integrator, we depleted INTS11 or INTS1 and per- 
formed a time-course analysis of eRNA induction using specific 
primer sets for each strand. Depletion of either Integrator subunit 
diminished the EGF-induced increase in eRNA levels from both 
strands of the enhancers (Extended Data Fig. 4a, b). Analysis of 
regulatory landscape in the proximity of the EGF-responsive gene 
ATF3 (activating transcription factor 3) revealed the presence of 
clusters of acetylated H3K27 and p300 binding sites similar to that 
described for super-enhancers'*'* (Extended Data Fig. 4c). This 
region also displayed occupancy by RNAPII at multiple sites, and 
we could detect additional recruitment of RNAPII and Integrator 
to these sites following EGF stimulation (Extended Data Fig. 4c). 
Analysis of eRNA synthesis using strand-specific RNA-seq and 
real-time PCR (during a time-course experiment) demonstrated a 
requirement for Integrator in the induction of eRNAs at the super- 
enhancer sites after EGF stimulation (Extended Data Fig. 4d). 
Collectively, these results highlight a requirement for Integrator in 
stimulus-dependent induction of eRNAs from individual enhancers 
and enhancer clusters. 

An important component of enhancer function is the formation of 
stimulus-dependent chromatin looping, allowing enhancer and pro- 
moter communication’*"'’. We measured chromatin looping between 
NR4A1 and DUSP1 enhancers and their respective promoters using 
chromosome conformation capture (3C) following stimulation with 
EGF (Fig. 2a). We observed a robust association between the enhancer 
and the promoter regions of NR4A1 and DUSP1 after EGF stimulation 
(Fig. 2b). Remarkably, depletion of Integrator abrogated the EGF- 
induced chromatin looping without any effect on non-stimulus- 
induced chromosomal interactions (Fig. 2b, c and Extended Data 
Fig. 5a, b). These results demonstrate that Integrator regulates 
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Figure 1 | Integrator mediates induction of eRNAs. a, EGF induction of an 
enhancer in the vicinity of the NR4A 1 gene (see Extended Data Fig. 1i). RNAPII 
and INTS11 are recruited to the enhancer after 20 min of stimulation and 
eRNAs are transcribed bi-directionally from the locus (as revealed by deep 
sequencing of chromatin-associated RNA, ChromRNA-seq). The y axis 
represents the read counts normalized to sequencing depth. b, Average profile 
of Integrator recruitment to 91 EGF-responsive enhancers. TSS indicates 
transcription start site. The y axis shows the average of read density. c, Increased 
Integrator occupancy at enhancers and their corresponding protein-coding 
genes (mean density was calculated as follows: 6 kb surrounding the peak of 
RNAPII for eRNAs; from —0.5 kb to +2.5 kb for coding genes; P < 0.001). 
Whiskers on the box plot indicate the variability in the datasets. d, Average 
profile of RNAPII upon EGF treatment at enhancers. e, Increased RNAPII 


enhancer function as reflected by the physical association between 
enhancers and their respective promoters. 

To gain an insight into the mechanism by which Integrator regu- 
lates enhancer function and eRNA biogenesis, we depleted Integrator 
and performed RNAPII profiling and global nuclear run-on followed 
by high-throughput sequencing (GRO-seq) after EGF induction. 
Notably, Integrator depletion resulted in the increase and spreading 
of GRO-seq reads throughout the body of eRNA transcripts at both 
enhancers and super-enhancers, which was mirrored by a concom- 
itant increase and spreading of RNAPII localization (Fig. 3a, b). 
Indeed, the average profile of depth-normalized reads of 91 EGF- 
induced enhancers showed a significant accumulation of GRO-seq 
and RNAPII ChIP-seq reads (Extended Data Fig. 6a, b). Analysis of 
RNAPII travelling ratio, a measure of RNAPII productive elongation, 
revealed that in contrast to EGF-responsive protein coding genes, 
which experience a block in productive elongation after Integrator 
depletion’*, there is increased RNAPII occupancy in the body of 
eRNA transcripts (Extended Data Fig. 6c, d). The accumulation of 
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occupancy following EGF stimulation at enhancers and their corresponding 
protein-coding genes (P < 0.005). f, Inducible knockdown of INTS11 
(doxycycline (dox)) markedly reduces steady-state levels of eRNAs (as 
measured by total RNA-seq). Data were obtained using a tet-inducible shRNA 
system, stably transduced in HeLa cells. Acetylation of H3K27 is also shown. 
g, h, Average expression levels of 91 eRNAs and their neighbouring (<500 kb) 
57 protein-coding genes indicate a significant impairment of activation. Box 
plots represent the expression fold change (logy) before and after EGF 
treatment in normal conditions (ctrl) and upon depletion of Integrator (dox) 
(t-test, P <0.0005 for all panels). Fold change of RPKM (reads per kilobase of 
exon per million mapped reads) values was calculated from RNA-seq (f) and 
ChromRNA-seq (g) data. 


RNAPII at eRNA loci after Integrator depletion occurred despite the 
decreased recruitment of super elongation complex (SEC) to enhan- 
cers (Extended Data Fig. 7a, b). 

The increased RNAPII occupancy at eRNA loci suggests a block 
in 3’-end cleavage of primary eRNA transcripts, leading to a defect 
in termination. To quantitate such a 3’-end cleavage defect, we 
measured the accumulation of primary levels (or unprocessed levels) 
of eRNA transcripts after Integrator depletion using semi-quantitative 
PCR and real-time PCR. We observed a 3- to 10-fold accumulation of 
unprocessed eRNA transcripts concomitant with the reduction of 
the processed eRNA levels (Fig. 3c-e and Extended Data Fig. 8a). 
Previous experiments revealed that the loss of 3’-end cleavage by 
Integrator led to increased levels of polyadenylated U snRNA tran- 
scripts, which are normally not polyadenylated’’. Indeed, analysis of 
the polyadenylated transcripts revealed a robust increase in polyade- 
nylation of eRNAs in the absence of Integrator (Fig. 3f, g). These 
results attest to Integrator cleavage of the 3’ end of eRNAs leading to 
a termination of transcription. 
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We surmised that such a termination defect might result in the inab- 
ility of RNAPII to dissociate from the eRNAs, leading to accumulation of 
RNAPII-eRNA complexes and a consequent decrease in mature eRNA 
levels. We performed ultraviolet (UV) cross-linking followed by RNA 
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Figure 2 | Integrator is required for enhancer- 


promoter interaction. a, Diagrams of NR4A1 


(left) and DUSP1 (right) genomic regions with 
their respective enhancers (shown in red). The 
arrowheads depict the position of primers for 
detection of chromatin looping and the stick bars 
indicate enzyme digestion sites (named N1-6 and 
D1-5). E refers to the anchor primer at the 
enhancer sites; control sites are also indicated. 

b, Looping events between the promoter region 
of NR4A1 and its enhancer were detected at N3, 
N4 and N5 sites after EGF induction (left). A 
similar interaction was also captured between 
sites D3 and D4 of DUSP1 promoter and its 
downstream enhancer after EGF induction 
(right). c, Knockdown of Integrator abolished 
chromosomal looping events at both NR4A1 and 
DUSP1 sites. The interaction frequency between 
the anchoring points and the distal fragments were 
determined by real-time PCR and normalized to 
BAC templates. All sites were assayed in three 
independent experiments (P < 0.01, two-sided 
t-test). Control anchors are displayed in Extended 
Data Fig. 5. 


immunoprecipitation (UV-RIP) using antibodies against RNAPII to 
examine increased association of eRNAs with RNAPII after depletion 
of Integrator. Consistent with a role for Integrator in the processing of 
eRNAs, depletion of Integrator led to a profound increase in eRNA 


Figure 3 | Integrator has a role in termination of 
eRNAs. a, b, RNAPII dynamics was analysed by 
ChIP-seq and GRO-seq at the enhancer regions 
adjacent to NR4A1 and DUSPI (a) and at the 
super-enhancer upstream of DUSP5 (b). The y axis 
represents the read counts normalized to 
sequencing depth. c, 3’-end cleavage of eRNAs was 
examined with semi-quantitative PCR. Primer 
pairs were designed to amplify a portion of the 
enhancer transcript as detected in the control 
GRO-seq experiment (t, total) or a longer template 
further extending into the 3’ of the enhancer 
region (u, unprocessed). d, PCR analysis was 
performed in two independent replicates, before 
(ctrl) and after (dox) depletion of INTS11 at three 
eRNAs (sense and antisense strand). e, The 
housekeeping gene GUSB was used as a cDNA 
loading control. f, Polyadenylation of RNAs 
increases after depletion of Integrator at DUSP1 
and CCNLI enhancer loci. The polyadenylated 
fraction of RNA from whole-cell lysates was 
sequenced after EGF stimulation, before and after 
depletion of INTS11 (dox). g, Box plot shows 
significant increase in polyadenylated RNA reads 
(P < 0.001) across the entire set of EGF responsive 
enhancers. Whiskers on the box plot indicate 

the variability in the datasets. 
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engagement with RNAPII following induction with EGF (Extended Data 
Fig. 8b-d). We found similar results after analysis of RNAPII interaction 
with the eRNAs at the A TF3 super-enhancer (Extended Data Fig. 8 e-g). 
Taken together, these results implicate the Integrator complex in the 
termination of eRNAs and highlight Integrator’s role in the release of 
eRNA transcripts from transcribing RNAPII. 

The catalytic subunit of Integrator is composed of the heterodimer 
of INTS11 and INTS9 enzymes with close homology to CPSF73 and 
CPSF100, respectively”. We previously showed that a single point 
mutation (E203Q) in the catalytic domain of INTS11 leads to impaired 
processing of small nuclear RNAs’. To assess the impact of INTS11 
enzymatic activity on eRNA biogenesis, we developed wild type and 
mutant INTS11 (E203Q) that would be refractory to the action of 


shRNAs against INTS11, and used these constructs to perform rescue 
experiments. While ectopic expression of wild-type INTS11 could 
substantially rescue the EGF-induced eRNA levels after depletion of 
INST11, the single-point catalytic mutant was without any effect 
(Fig. 4a and Extended Data Fig. 9a). Interestingly, we observed a sim- 
ilar rescue of the transcriptional activation of EGF-induced genes by 
the wild-type INTS11 and not its catalytic mutant (Fig. 4b). These 
results not only demonstrate the requirement of INTS11 catalytic 
activity in regulating the induction of eRNAs but also highlight the 
defect in eRNA processing as a contributing factor in the loss of tran- 
scriptional responsiveness. 

To determine the scope of Integrator function on active enhancers 
we analysed the 2,029 transcriptionally active enhancers in HeLa cells. 


Figure 4 | Integrator has a global role in 
enhancer regulation. a, Ectopic expression of 
wild-type INTS11, and not its catalytic mutant 
(E203Q), following Integrator depletion can rescue 
eRNA induction by EGF. b, A similar rescue was 
observed for wild-type INTS11 on the target 
protein-coding genes. Real-time PCR analysis was 
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performed on CCNL1 and DUSP1 eRNAs and their 
corresponding mRNAs before and after EGF 
stimulation. Each eRNA was assayed with two sets 
of primers. Error bars represent + s.e.m. (1 = 3 
biological independent experiments), **P < 0.01 
by two-sided t-test. c, The heat map showcases 
2,029 enhancer regions identified using RNAPII 
extragenic loci enriched in H3K27 acetylation (see 
Methods). Enhancers were centred at the middle of 
the RNAPII peak and ranked by transcription 
activity (GRO-seq). The distribution of p300 and 
H3K27ac are consistent with a group of active 
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enhancers. Upon Integrator depletion, nascent 
RNA reads and RNAPII profiles spread beyond the 
normal 3’end of eRNAs. d, Model for the role of 
Integrator at eRNAs. Stimulation of serum-starved 
cells with EGF triggers recruitment of RNAPII 
and Integrator to enhancer sites and induces 
bi-directional transcription of non-polyadenylated 
eRNAs. Upon EGF stimulation Integrator 
navigates the enhancers along with RNAPII to 
promote endonucleolytic cleavage of nascent 
transcripts, leading to release of the mature eRNAs. 
Depletion of Integrator elicits a cleavage defect 
leading to faulty termination, which results in 
extended eRNA transcripts and accumulation of 
RNAPII. 
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We ranked the enhancers based on their transcriptional activity, which 
mirrored that of RNAPII occupancy (Fig. 4c). Notably, depletion of 
Integrator resulted in processing defects at all active enhancers, as 
reflected by the broadening of GRO-seq and RNAPII ChIP-seq reads 
commensurate with the transcriptional activity of each enhancer site 
(Fig. 4c). This was in contrast to GRO-seq and RNAPII profiles at 
transcriptionally active protein-coding genes (Extended Data Fig. 9b). 
These results demonstrate the generality of Integrator in the proces- 
sing of eRNAs at enhancers (Fig. 4d). 

Recent genome-wide studies have revealed the presence of RNAPII 
at active enhancers coincident with expression of these regulatory 
elements as long non-coding RNAs*”’. Importantly, such eRNAs have 
been shown to have critical roles in transcriptional induction by a 
variety of signal transduction pathways’*'''*. We show that 
Integrator is the molecular machine that is recruited to enhancers in 
a signal-dependent manner and is required for the induction of 
eRNAs. We surmise that the defect in 3’-end processing following 
Integrator depletion leads to a termination defect reflected in increased 
levels of primary eRNA transcripts. It is also likely that Integrator 
affects the stability of the mature transcripts, since its depletion leads 
to changes in steady-state levels of mature eRNAs. 

Similar to other regulatory complexes, Integrator is also recruited to 
the promoters of protein-coding genes including IEGs'****. Interes- 
tingly, recent reports described an association between Integrator and 
transcriptional pause release factors, negative elongation factor 
(NELF) and SPT4-SPT5 complexes’*'*****, NELF was also reported 
to associate with eRNAs in neuronal cells”°®. Indeed, we found that 
Integrator depletion resulted in a defect in transcriptional initiation 
as well as pause release, which was reflected in the loss of responsive- 
ness of IEGs to EGF stimulation’*. However, depletion of NELF sub- 
units did not affect eRNA induction (Extended Data Fig. 7c, d). 
Moreover, Integrator depletion did not change NELF occupancy at 
EGF-induced enhancers (Extended Data Fig. 7e). Taken together, our 
results point to multiple functions for Integrator at protein-coding 
genes. While Integrator at promoters regulates pause release factors, 
leading to modulation of productive transcriptional elongation, 
Integrator at enhancers governs eRNA maturation and enhancer- 
promoter communication. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. 

Genome-wide data. High-throughput sequencing data analysed in this study are 
originally described in ref. 18 and are deposited at the Gene Expression Omnibus 
with accession number GSE40632. 

H3K27ac, H3K4mel and p300 data sets from HeLa-S3 cells are available as part 
of the ENCODE project”® and can be retrieved under the following accession 
numbers: GSM733684, GSM798322, GSM93550. Additional experiments are 
deposited at GEO (GSE68401) and include RNA-seq data (chromatin-bound 
RNA, polyadenylated and non-polyadenylated fractions of total RNA) as well 
ChIP-seq experiments (acetylation of H3K27 and occupancy of NELFA). Every 
genome-wide experiment is performed in two independent biological replicates. 
Genome-wide identification of eRNA loci. Peak analysis of RNAPII ChIP-seq 
data after EGF stimulation was performed using HOMER 4.6 (run in ‘factor’ 
mode). Next, we used the BEDtools suite to discard any peak overlapping to: 
(i) all exons from Hg19 UCSC Known Genes (with additional 2 kb surrounding 
every exon); (ii) RNA Genes (from the Hg18 genome annotation table, plus 
additional 1 kb); (iii) tRNA Genes (Hg19, plus additional 1kb). We further 
selected peaks overlapping (+400 bp) with H3K27ac peaks from the ENCODE 
ChIP-seq obtained in HeLa-S3 (GEO GSE31477). The analysis resulted in 2,029 
regions that were further examined for their transcriptional response to EGF. 
Briefly, we centred a 6-kb window at the midst of the RNAPII peak and we used 
HOMER 4.6 to calculate RPKM across the entire eRNA locus using chromRNA- 
seq data before and after EGF induction. We selected a group of 225 EGF-indu- 
cible eRNAs displaying a fold change greater than 2 (ctrl versus EGF) and iden- 
tified the nearest EGF regulated gene (fold change RPKM >1.6). 91 EGF-induced 
enhancer RNAs located within 500 kb from the nearest EGF-responsive protein- 
coding genes were selected for further analysis. 

ChIP-seq data analysis. ChIP-seq data were obtained using HiSeq 2000 and 
NextSeq 500. Reads were aligned to the human genome hg19 using bowtie2”” 
(end to end alignment, sensitive option). Snapshots of raw ChIP-seq data pre- 
sented throughout the figures were obtained as follows: BigWiggle files for every 
ChIP-seq were generated using samtools, bedtools and RseQC’, these tracks were 
then uploaded to the UCSC Genome Browser hg19. 

Clustering, heat maps and average density analysis. ChIP-seq, GRO-seq and 
RNA-seq data were subjected to read density analysis; seqMINER 1.3.3” was used 
to extract read densities at all enhancer loci with the following parameters: 5’ 
extension = 4 kb, 3’ extension = 4kb, no read extension, total bin number = 180 
bins. Mean density profiles were then generated in R 3.0.1 and normalized to 
sequencing depth. Heat maps were generated with ChAsE (http://chase.cs. 
univie.ac.at/), using default parameters, a 10 kb window and 400 bins and with 
ngsplot*’. 

qChIP. ChIP was performed in HeLa cells as already described'*. Cells were cross- 
linked with 1% formaldehyde for 10min at room temperature, harvested and 
washed twice with 1x PBS. The pellet was resuspended in ChIP lysis buffer 
(150mM NaCl, 1% Triton-X 100, 0,7% SDS, 500 B.MDTT, 10mM Tris-HCl, 
5mM EDTA) and chromatin was sheared to an average length of 200-400 bp, 
using a Bioruptor sonication device (20 min with 30s intervals). The chromatin 
lysate was diluted with SDS-free ChIP lysis buffer and aliquoted into single immu- 
noprecipitations of 2.5 X 10° cells each. A specific antibody or a total rabbit IgG 
control was added to the lysate along with Protein A magnetic beads (Invitrogen) 
and incubated at 4 °C overnight. On day 2, beads were washed twice with each of 
the following buffers: Mixed Micelle Buffer (150 mM NaCl, 1% Triton-X 100, 0.2% 
SDS, 20 mM Tris-HCl, 5 mM EDTA, 65% sucrose), Buffer 500 (500 mM NaCl, 1% 
Triton-X 100, 0.1% Na deoxycholate, 25 mM HEPES, 10mM Tris-HCl, 1 mM 
EDTA), LiCl/detergent wash (250 mM LiCl, 0.5% Na deoxycholate, 0.5% NP- 
40, 10mM Tris-HCl, 1mM EDTA) and a final wash was performed with 1X 
TE. Finally, beads were resuspended in 1X TE containing 1% SDS and incubated 
at 65°C for 10 min to elute immunocomplexes. Elution was repeated twice, and 
the samples were further incubated overnight at 65°C to reverse cross-linking, 
along with the untreated input (2.5% of the starting material). After treatment with 
0.5 mg ml * proteinase K for 3h, DNA was purified with Wizard SV gel and PCR 
Clean-up system (Promega). ChIP eluates and input were assayed by real-time 
quantitative PCR in a 20 ul reaction with the following: 0.4 1M of each primer, 
10 ul of iQ SYBR Green Supermix (BioRAD), and 5 il of template DNA (corres- 
ponding to 1/40 of the elution material) using a CFX96 real-time system 
(BioRAD). Thermal cycling parameters were: 3 min at 95 °C, followed by 40 cycles 
of 10s at 95 °C, 20s at 63 °C followed by 30s at 72 °C. 

Subcellular fractionation. Subcellular fractionation was followed as described®, 
with minor changes. The cell lysate was re-suspended in cold lysis buffer with 
0.15% NP-40, and the sucrose buffer was used to isolate nuclei. Glycerol buffer 
(20 mM Tris pH 7.9, 75 mM NaCl, 0.5 mM EDTA, 50% glycerol, 0.85 mM DTT) 
and nuclei lysis buffer (20 mM HEPES pH 7.6, 7.5mM MgCh, 0.2mM EDTA, 


0.3 M NaCl, 1 M urea, 1% NP-40, 1 mM DTT) were used to isolate nucleoplasmic 
fraction and chromatin-bound RNA fraction. Chromatin-bound RNA was iso- 
lated with Trizol protocol. 

RNA isolation for high-throughput sequencing. Total RNA or chromatin- 
bound RNA was extracted using Trizol reagent (Life Technologies). Genomic 
DNA and ribosomal RNA was removed with Turbo DNA-free kit and 
RiboMinus Eukaryote Kit (Life Technologies). The polyA and non-polyA frac- 
tions were isolated by running RNA samples three times through the Oligo(dT) 
Dynabeads (Life technologies) to ensure complete separation. The resulting RNA 
fractions were subjected to strand-specific library preparation using NEBNext 
Ultra Directional RNA Library Prep Kit for Illumina (New England Biolabs). 
Sequencing was performed on Nextseq500 (Illumina). 

ChIP-seq. ChIP-sequencing was performed as previously described®. 1 X 10’ cells 
were crosslinked in 1% formaldehyde for 10 min and sonicated with a Bioruptor to 
obtain chromatin fragments of 200-300 bp. Immunoprecipitation was performed 
overnight with the specific antibodies and Dynabeads Protein A or Protein G 
beads (Life Technologies). Beads were washed and chromatin fragments were 
eluted in TE with 1% SDS at 65°C. After de-crosslinking overnight, DNA was 
extracted using Wizard SV extraction columns (Promega) and Illumina sequen- 
cing libraries were prepared using NEBNext ChIP-seq library per reagent set (New 
England Biolabs) and following manufacturer's instructions. Libraries were 
assayed on a BioAnalyzer (High Sensitivity DNA kit) and sequenced on a 
Nextseq500 (Illumina). 

Antibodies. Chromatin immunoprecipitation was performed with polyclonal 
antibodies against INTS11, INTS9, INTS1 (Bethyl, A301-274A, A300-412A, 
A300-361A). ChIP-seq of NELFA and H3K27ac were performed with goat poly- 
clonal antibodies (Santa Cruz, sc-23599) and rabbit polyclonal antibodies (Abcam, 
ab4729), respectively. 

Antibodies used for immunoblot analysis were: y-tubulin (Santa Cruz, mouse 
monoclonal, sc-17788), CBP80 (Santa Cruz, mouse monoclonal sc-271304), 
INTS1 (Bethyl, rabbit polyclonal, A300-361A) anda proprietary rabbit polyclonal 
raised against the C terminus of INTS11. Flag M2-conjugated beads (Sigma, 
A2220) were used for immunoprecipitation. 

Chromosome conformation capture (3C). 3C assay was performed as prev- 
iously described with minor changes*'. HeLa cells were filtered through a 70 
jum strainer to obtain single cell preparation. 1 X 10’ cells were then fixed in 
1% formaldehyde for 30 min at room temperature for cross-linking. The reaction 
was quenched with 0.25 M glycine and cells were collected by centrifugation at 
240g for 8 min at 4 °C. Cell pellet was lysed in 5 ml cold lysis buffer (10 mM Tris- 
HCl, pH 7.5; 10mM NaCl; 5mM MgCl; 0.1mMEGTA) with freshly added 
protease inhibitors (Roche) on ice for 15 min. Isolated nuclei were collected by 
centrifugation at 400g for 5min at 4°C then re-suspended in 0.5 ml of 1.2x 
restriction enzyme buffer (NEB) with 0.3% SDS and incubated for 1h at 37°C 
while shaking at 900r.p.m. Next, samples were incubated for 1h at 37°C after 
addition of 2% (final concentration) Triton X-100. 400 U of restriction enzyme 
was added to the nuclei and incubated at 37 °C overnight. 10 pl of samples were 
collected before and after the enzyme reaction to evaluate digestion efficiency. 
The reaction was stopped by addition of 1.6% SDS (final concentration) and 
incubation at 65 °C for 30 min while shaking at 900 r.p.m. The sample was then 
diluted 10-fold with 1.15 ligation Buffer (NEB) and 1% Triton X-100 and 
incubated for 1h at 37°C while shaking at 900r.p.m. 400 U of T4 DNA ligase 
(NEB) were added to the sample and the reaction was carried at 16°C for 4h 
followed by 30 min at room temperature. For each sample, 300 jig of Proteinase K 
were added for protein digestion and de-crosslinking at 65 °C overnight. On the 
next day, RNA was removed by adding 300g of RNase and incubating the 
sample for 1 h at 37 °C. DNA was purified twice by phenol-chloroform extraction 
and ethanol precipitation. Purified DNA was then analysed by conventional or 
quantitative PCR. As control for ligation products, the Bac-clones were digested 
with 10U of restriction enzyme overnight and then incubated with 10U T4 
DNA-ligase at 16 °C overnight. The DNA was extracted by phenol-chloroform 
and precipitated with ethanol. Purified DNA was then analysed by conventional 
or quantitative PCR. For real-time PCR, the AC, method was applied for ana- 
lysing data, using the Bac-clone C, values as control. Primer sequences for PCR 
are listed in Supplementary Table 2. Bac clone ID: RP-11-294A10, RP-11- 
1107P14, RP-11-1068G13 (Empire Genomics). 

Pol II RNA immunoprecipitation. RIP was performed as described*’. HeLa cells 
were UV-crosslinked at 254 nm (200 mJ cm *) in 10 ml ice-cold PBS and collected 
by scraping. Cells were incubated in lysis solution (0.1% SDS, 0.5% NP40, 0.5% 
sodium deoxycholate, 400Um121 RNase Inhibitor (Roche)) and protease inhibitor 
at 4°C for 25 min with rotation, followed by DNase treatment (30 U of DNase, 
15 min at 37 °C). Protein A Dynabeads (Invitrogen) were incubated with 2 jig Pol 
II antibody (Santa Cruz, N-20) and the cell lysate at 4°C overnight. The purified 
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protein-RNA complex was extracted using TRIzol method for RNA extraction 
and subjected to RT-qPCR with corresponding primers. 

Inducible cell lines. INTS11 and INTS1 knockdown inducible clones were gen- 
erated from HeLa cells using the Tet-pLKO-puro vector. For EGF induction, cells 
were serum starved in 0.5% FBS for 48h and treated with 100ngml_' EGF 
(Invitrogen) for the indicated time course. All cell lines in this study are myco- 
plasma negative. 

Transfections. Cells were treated with doxycycline for 48 h. 24h before EGF induc- 
tion, INTS11 and INTS11 (E203Q) mutant protein expression plasmids were trans- 
fected using Lipofectamine 2000 (Life Technologies, Inc.) according to the 
manufacturer’s instruction. Cells were harvested 0 and 20 min after EGF induction. 

All the PCR primer sequences are listed in the supplementary Table 2. 
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Extended Data Figure 1 | Identification eRNAs responsive to EGF. a, We 
identified 91 EGF-responsive enhancer regions in HeLa cells. We annotated 
extragenic RNAPII sites (see Methods) and used the middle of the RNAPII 
peak as an anchor to display average profiles of p300, H3K27ac and H3K4mel 
(data from the ENCODE project). The profiles represent the mean read density 
of ChIP-seq data. The 91 loci display a typical enhancer signature, with 
enrichment of p300 and H3K27ac around the TSS and a broader decoration by 
H3K4mel. b, Profiles of H3K27ac were obtained from ChIP-seq analysis of 
HeLa cells before and after 20 min of EGF induction. Mean read density was 
normalized to sequencing depth. c, EGF stimulates bi-directional transcription 
from 91 enhancer regions. We displayed the mean read density obtained 
from strand-specific sequencing of the chromatin-bound RNA fraction 
(ChromRNA-seq). d, e, Normalized read density (RPKM) was calculated from 
RNA-seq data for 91 eRNAs (d) and 57 neighbouring protein-coding genes 
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(e) that responded to EGF stimulation (FC >1.6) and mapped within 500 kb 
from an EGF-responsive eRNA. f, Average profiles of ChromRNA-seq data at 
91 enhancer loci (mean density of reads, normalized to total read number). 
g, h, Box plot of 91 eRNAs before and after treatment with EGF shows the 
average increase of transcription 20 min after stimulation (P < 0.001), matched 
by an increase in the neighbouring protein-coding genes (P < 0.02). i, NR4A1 
is activated by EGF in HeLa cells: RNAPII and INTS11 are recruited to the 
NR4A1 locus after 20 min of stimulation, with concomitant accumulation of 
reads from RNA-seq and ChromRNA-seq. A neighbouring eRNA locus also 
exhibits increased transcription along with RNAPII and INTS11 recruitment. 
Sequencing tracks are visualized in BigWig format and aligned to the hg19 
assembly of the UCSC Genome Browser. Whiskers on the box plots indicate the 
variability in the datasets. 
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Extended Data Figure 2 | EGF-induced eRNAs are predominantly non- of the UCSC Genome Browser. CCNL1 and DUSP1 enhancers were displayed 
polyadenylated. a, We examined transcription at three enhancers adjacent (b) along with a polyadenylated control (DUSP1 protein-coding locus) and a 
to EGF-responsive genes CCNL1, NR4A1 and DUSP1. Total RNA samples non-polyadenylated transcript (snRNA U12) (c). All EGF-induced eRNAs 
were collected before and after EGF induction. Reverse transcription was and protein-coding genes (RefSeq hg19) were examined for their average 
performed with random hexamer primer or oligo d(T) primer. Each eRNA RPKM throughout the entire locus. d, We compared polyadenylation levels of 
strand was analysed by real-time PCR with specific primers. Error bars 225 eRNAs and 150 protein-coding genes (2 fold induction upon EGF, RPKM 


represent + standard error of the mean (s.e.m., n = 3 biological independent _ calculated from ChromRNA-seq data previously described). The box plot 
experiments). P< 0.01 by two-sided t-test. b, c, RNA-seq was performed on shows predominance of non-polyadenylated transcripts mapping to eRNA 
the polyadenylated and non-polyadenylated fraction of total RNA. RNA-seq _loci, as opposed to transcripts coding for RefSeq genes. Whiskers on the box 
tracks were visualized in BigWig format and aligned to the hg19 assembly plot indicate the variability in the datasets. 
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Extended Data Figure 3 | The Integrator complex is recruited to enhancers 
upon EGF stimulation. a, qChIP analysis of Integrator occupancy using 
INTS11, INTS1 and INST9 antibodies at four eRNA loci. Data were collected 
during a time course of EGF induction in HeLa cells (0, 20, 40 and 60 min). 
Error bars represent + standard error of the mean (s.e.m., n = 3 biological 
independent experiments). P < 0.01 by two-sided t-test. b, Depletion of INST1 
and INST11 protein levels in tet-inducible HeLa clones. The arrow indicates the 
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of H3K27 acetylation (0 min/20 min EGF) before (ctrl) and after (dox) 
depletion of INTS11. Data were calculated from read density of ChIP-seq 
experiments across EGF-induced enhancers. Depletion of Integrator 
significantly affects EGF-dependent increase in H3K27ac (P < 0.05). Whiskers 
on the box plot indicate the variability in the datasets. 
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Extended Data Figure 4 | Depletion of Integrator impairs activation of 
eRNAs by EGF. a, b, Activation of RNAs near DUSP1, CCNL1 and NR4A1 
genes were assayed by qRT-PCR in three independent experiments, using 
INTS11 (a) or INTS1 (b) inducible shRNA clones. Transcription was followed 
throughout a 20-min time-course experiment. Each eRNA was amplified with 
two different sets of specific primers to analyse both strands; dashed lines 
indicate treatment with doxycycline (dox) to induce shRNAs. Data at every 
time point are reported as fold change (EGF/non-induced). Error bars 
represent + s.e.m. (1 = 3 biological independent experiments), P< 0.01 by 
two-sided t-test. c, Schematic representation of ATF3 and its super-enhancer 
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region located 30 kb upstream (top). Snapshots of ChIP-seq and RNA-seq 
tracks show EGF-dependent recruitment of RNAPII and INTS11 at the 
ATF3 locus and at several upstream enhancers. Depletion of INTS11 nearly 
abolished transcription of RNAs and ATF3 mRNA. d, Real-time RT-PCR 
analysis of the ATF3 super-enhancer region upon depletion of INTS11. qPCR 
analysis was performed before and 5, 10, 15, 20 min after EGF treatment with 
strand-specific primer sets (indicated below the RNA-seq tracks in ¢). Error 
bars represent + s.e.m. (n = 3 biological independent experiments), P< 0.01 
by two-sided t-test. 
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74 kb upstream of the NR4A1 protein-coding gene and the Conz2 site is located and a downstream control site (Con). All data were averaged from three 
42 kb downstream of the enhancer site. There are no looping events between —_ independent experiments, P< 0.01 by two-sided t-test. 
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Extended Data Figure 6 | Integrator has a role in eRNA termination. 

a, Mean density profiles of GRO-seq data at 91 EGF-induced enhancers. Data 
are presented as strand-specific mean read density, centred at the middle 

of the RNAPII peak and normalized to sequencing depth. The underlying box 
plots were used to quantify the enrichment of GRO-seq reads at the 3’ end 
of both eRNA transcripts (2 kb window, centred 1 kb downstream of the 
RNAPII peak). b, RNAPII profiling at 91 enhancers after INTS11 depletion 
shows accumulation of ChIP-seq reads towards the 3’ end. Data are presented 
as mean read density, centred at the middle of the RNAPII peak and normalized 
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to sequencing depth. Box plots represent the enrichment of RNAPII reads of 
both eRNA transcripts (2 kb window, centred 1 kb downstream of the RNAPII 
peak). RNAPII significantly accumulated (P < 0.004) after depletion of 
INTS11. Whiskers on the box plots indicate the variability in the datasets. 

c, d, RNAPII travelling ratio at enhancers was measured as the ratio between 
RNAPII density close to the transcription start site (the surrounding 300 bp) 
and 3 kb downstream. Given the bi-directional nature of transcription at 
enhancers, travelling ratio was calculated for both sense (c) and antisense 

(d) transcripts. 
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Extended Data Figure 7 | Analysis of super elongation complex at 
enhancers. a, b, Metagene analysis on 91 eRNA loci shows the effect of EGF 
stimulation and INTS11 depletion on the recruitment of the ELL2 (a) and AFF4 
(b) subunits of the super elongation complex (SEC). SEC was recruited to 
enhancers upon EGF stimulation. Depletion of Integrator decreases AFF4 and 
ELL2 recruitment. Data were visualized as mean read density, normalized to 
sequencing depth, across 8 kb surrounding the centre of enhancers. c, To 
investigate the role of the negative elongation factor (NELF) in induction of 
eRNAs, we infected HeLa cells with lentiviral shRNAs against NELFA, NELFE 
and a control GFP. Quantitative RT-PCR analysis shows the extent of NELF 
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depletion 72h after infection. Error bars represent + s.e.m. (n = 3 biological 
independent experiments), P< 0.01 by two-sided t-test. d, Depletion of two 
different NELF subunits does not significantly impact activation of EGF- 
responsive eRNAs. Data represent fold change of induction (EGF/not induced) 
after 20 min of stimulation and were normalized against GUSB expression. 
Error bars represent + s.e.m. (n = 3 biological independent experiments), 
P<0.01 by two-sided t-test. e, ChIP-seq analysis of NELFA before and after 
depletion of INTS11. Metagene analysis shows mean read density (normalized 
to sequencing depth) across 91 eRNAs. NELF occupancy at enhancers was 
not affected by depletion of Integrator. 


©2015 Macmillan Publishers Limited. All rights reserved 


i) 


Fold Change 


CCNL1 eRNA 
b Cc 


kk 
®@ NR4A1-S 
NR4A1-As 
aK l 


CTRL CTRL+EGF DOX DOX+tEGF 
e f 


6 
4 


CTRL CTRL+EGF DOX DOX+EGF 


16 12 | CCNL1-S 


CCNL1-As 


Fold enrichment 


DATF3.1-S 
BATF3.2-S 


DATF3.2-As 


DATF3.1-As 


Fold enrichment 


Extended Data Figure 8 | Integrator depletion causes accumulation of 
unprocessed eRNAs and prevents release of RNAPII. a, Termination of 
eRNAs was examined with quantitative RT-PCR. Primer pairs were designed 
to amplify a portion of the enhancer transcript detected in normal condition 
(t, total) or a longer template further extending into the 3’of the enhancer 
region (u, unprocessed). qPCR analysis was performed before (ctrl) and after 
(dox) depletion of INTS11 at three eRNAs (sense and antisense strand), 

after stimulation with EGF. In the absence of INTS11, we observed 
accumulation of unprocessed eRNA, suggestive of a termination defect. Error 
bars represent + s.e.m. (n = 3 biological independent experiments), P< 0.01 
by two-sided t-test. Release of eRNA transcripts from RNA polymerase 

was investigated by means of RNAPII immunoprecipitation following UV 
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cross-link (UV-RIP). b-d, After RNAPII immunoprecipitation, eRNAs near 
DUSP1, CCNL1 and NR4A1 genes were assayed by qRT-PCR and showed 
increased association with RNAPII in the absence of Integrator. Each eRNA 
was detected by two different sets of specific primers (sense and antisense). 
Error bars represent + s.e.m. (n = 3 biological independent experiments). 
*P<0.01, **P < 0.01, ***P < 0.001 by two-sided t-test. e-g, RNAPII UV-RIP 
analysis was also performed on several eRNAs from the ATF3 super-enhancer. 
qRT-PCR on the RNA recovered after immunoprecipitation shows 
increased association between RNAPII and eRNAs in the absence of Integrator. 
Each eRNA was detected by two different sets of specific primers (sense and 
antisense). Error bars represent + s.e.m. ( = 3 three independent 
experiments). **P < 0.01 by two-sided t-test. 
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Extended Data Figure 9 | Distribution of RNAPII and nascent RNAs across 
protein-coding genes. a, Expression level of exogenous INTS11 wild type 
(WT) and its catalytic mutant (E203Q). Nuclear extracts were subjected to 
Flag immunoprecipitation and probed with a polyclonal antibody raised 
against the C terminus of INTS11. b, Heat map of nascent RNA (GRO-seq) and 


RNAPII ChIP-seq across the 2,000 most active genes in HeLa cells. Gene loci 
were analysed for their entire gene body, with 3 additional kilobases on both 
ends. H3K27ac data from ENCODE is shown on the left; genes are ranked 
according to the intensity of RNAPII signal. Depletion of Integrator does not 
appear to affect termination at protein-coding genes. 
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Crystal structure of the dynamin tetramer 


Thomas F. Reubold**, Katja Faelber?*, Nuria Plattner*, York Posor‘*+, Katharina Ketel*, Ute Curth’, Jeanette Schlegel’, 
Roopsee Anand!, Dietmar J. Manstein’®, Frank Noé’, Volker Haucke*®, Oliver Daumke”® & Susanne Eschenburg! 


The mechanochemical protein dynamin is the prototype of the 
dynamin superfamily of large GTPases, which shape and remodel 
membranes in diverse cellular processes’. Dynamin forms predo- 
minantly tetramers in the cytosol, which oligomerize at the neck of 
clathrin-coated vesicles to mediate constriction and subsequent 
scission of the membrane’. Previous studies have described the 
architecture of dynamin dimers”’, but the molecular determinants 
for dynamin assembly and its regulation have remained unclear. 
Here we present the crystal structure of the human dynamin 
tetramer in the nucleotide-free state. Combining structural data 
with mutational studies, oligomerization measurements and 
Markov state models of molecular dynamics simulations, we sug- 
gest a mechanism by which oligomerization of dynamin is linked to 
the release of intramolecular autoinhibitory interactions. We elu- 
cidate how mutations that interfere with tetramer formation 
and autoinhibition can lead to the congenital muscle disorders 
Charcot-Marie-Tooth neuropathy‘ and centronuclear myopathy’, 
respectively. Notably, the bent shape of the tetramer explains how 
dynamin assembles into a right-handed helical oligomer of defined 
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Figure 1 | Structure of the dynamin 3 tetramer. The four molecules in the 
tetramer are coloured separately; in the molecule on the right each domain is 
individually coloured. The tetramer consists of two dimers, each formed via the 
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diameter, which has direct implications for its function in mem- 
brane constriction. 

The three highly conserved vertebrate isoforms of dynamin contain 
five distinct domains (Extended Data Fig. 1a): an N-terminal GTPase 
(G) domain mediating nucleotide binding and hydrolysis, a bundle 
signalling element (BSE), a stalk, a pleckstrin homology (PH) domain 
involved in lipid binding, and a proline-rich domain (PRD) mediating 
interactions with scaffolding proteins containing BAR- and SH3- 
domains*. To exert its function in clathrin-mediated endocytosis 
(CME), dynamin assembles via the stalks into a helical array surround- 
ing the necks of invaginating clathrin-coated pits”*. Dimerization of 
GTP-bound G domains from neighbouring helical rungs induces GTP 
hydrolysis’. The ensuing conformational changes are thought to be 
transmitted from the G domain via the BSE to the stalk, resulting in a 
sliding motion of adjacent helix rungs, concomitant helix constric- 
tion’, and eventually membrane scission. The inherent tendency to 
form large assemblies at high protein concentrations has hampered 
crystallization of dynamin in the past. Previously, the use of non- 
oligomerizing mutants led to the determination of dynamin 1 crystal 
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central interface 2. The two dimers are connected via interfaces 1 (left box) and 
3 (right box) to build the tetramer. One inner molecule is omitted from the 
detailed view for clarity. 
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structures~*; however, the postulated higher-order assembly interface 
was not resolved in these structures, leaving the oligomerization 
mechanism unaddressed. We reasoned that an alternative assembly- 
affecting mutation, such as K361S in dynamin 3 (ref. 11), might disturb 
the oligomerization interface to a lesser extent than the previously used 
mutants. We obtained crystals of nucleotide-free dynamin 3(K361S) 
lacking the PRD (dynamin 3(K361S/APRD)) that diffracted to 3.7A 
(Methods, Extended Data Fig. 1, Extended Data Table 1). The asym- 
metric unit of the crystal lattice contained a dynamin tetramer that did 
not form the filamentous superstructures seen for dynamin 1 (refs 2, 3). 

The dynamin tetramer consists of two dimers, each of which 
assembles via the previously described interface 2 (refs 2, 3, 12-14) 
(Fig. 1, Extended Data Fig. 2a, b). Different dimerization and assembly 
models were derived from electron microscopy (EM) reconstruc- 
tions and cross-linking experiments’. These models, however, are 
not compatible with the architecture of the dynamin tetramer 
(Extended Data Fig. 2c). To provide further evidence for dimerization 
via interface 2, we introduced the triple mutation 1481D/H677D/ 
L678S in dynamin 3(APRD) and the corresponding mutation 
(1481D/H687D/L688S) in dynamin 1(APRD). These mutants were 
monomeric in analytical ultracentrifugation experiments (Extended 
Data Fig. 2d). Thus, dimerization via interface 2 is indeed a general 
feature of dynamin and dynamin-like proteins. This conclusion 
receives additional support from recent cross-linking data’. 

Dynamin 3 dimers further assemble into tetramers via interface 1 
and interface 3 (Fig. 1 and Extended Data Fig. 3a). Interface 1 at the top 
of the stalk features four hydrophobic residues that are highly con- 
served in the dynamin superfamily (Fig. 1 and Supplementary Fig. 1). 
The main contributors for interface 3 are loop LIN® (superscript S 
denotes belonging to the stalk) of the ‘inner’ and loop 12° of the ‘outer’ 
stalks (Fig. 1), which mediate an intricate interaction network invol- 
ving all four stalks (Extended Data Fig. 3a). Accordingly, these loop 
regions are well defined in the inter-dimer interface (Extended Data 
Fig. 3b), but not at the outer, non-assembled sides of the tetramer. 
Previous studies have shown that mutation of R399 in loop L2° com- 
pletely destroys higher-order assembly and dynamin function**"’. 
In our structure, R399 of an outer molecule forms salt bridges to 
E410 in the «2° helix and to E345 in LIN® in the outer and inner 
molecules of the opposite dimer, respectively. In the hydrophobic core 
of interface 3, L402 and F403 in L2° of outer molecules interact with 
F493, F496, L655 and T651 of outer molecules in the neighbouring 
dimer (Fig. 1 and Extended Data Fig. 3a). Mutation of F403 and of 
E410 yielded predominantly dimeric protein and compromised 
liposome binding as well as liposome-stimulated GTPase activity 
(Fig. 2a-c). In dynamin 2, the F403A mutation substantially interfered 
with CME, as monitored by transferrin internalization (Fig. 2d). The 
effect of E410A on CME was less pronounced (Fig. 2d), as the struc- 
tural defect may in part be compensated by the second salt bridge that 
R399 forms to E345 (Fig. 1, Extended Data Fig. 3a). 

LIN® of an inner stalk also interacts with «1C®* of an outer stalk. 
Accordingly, mutation of N-terminal ($347A/G348A/D349A), cen- 
tral (Q350A/V351A/D352A/T353A) or C-terminal (L354A/E355A/ 
L356A/S357A) residue stretches in L1N° interfered with tetrameriza- 
tion (Fig. 2a). The central and C-terminal, but not the N-terminal 
mutations compromised liposome binding and assembly-stimulated 
GTPase activity (Fig. 2b, c). The Q350A/V351A/D352A/T353A 
mutant showed a reduced ability to sustain CME of transferrin, 
whereas the L354A/E355A/L356A/S357A mutant displayed a dom- 
inant-negative effect in transferrin uptake assays (Fig. 2d). The 
Charcot-Marie-Tooth neuropathy-related mutation G358R (refs 16, 
17) is located in the C terminus of L1N® (Extended Data Fig. 3c). This 
mutation led to a dimeric mutant that did not bind to liposomes 
(Fig. 2a-c). Likewise, it exhibited a dominant-negative effect on 
CME (refs 16 and 17, and Fig. 2d). The bulky arginine side chain 
probably interferes with the proper binding conformation of L1N®. 
Interestingly, the mutants L354A/E355A/L356A/S357A, G358R and 
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Figure 2 | Interface 3 is crucial for assembly and function of dynamin. 

a, Sedimentation velocity experiments for dynamin 3 and the indicated 
mutants. The following molecular masses were obtained for singly sedimenting 
species: Q350A/V351A/D352A/T353A, 162 kDa; L354A/E355A/L356A/ 
S357A, 164 kDa; G358R, 167 kDa. The molecular weight of the dynamin 3 
construct is 86 kDa. S, Svedberg; AU, absorbance units. b, Liposome co- 
sedimentation assays for dynamin 3 and the indicated mutants. S$, supernatant; 
P, pellet fraction; WT, wild type. c, The observed rate of GTPase activity for 
dynamin 3 and the indicated mutants in the absence or presence of liposomes. 
The average of two independent measurements is shown, with deviations 
ranging from 1% to 13%. d, The capacity of dynamin 2 mutants to reconstitute 
defective CME in HeLa cells depleted of endogenous dynamin 2, as monitored 
by fluorescent transferrin uptake. Data shown represent mean + s.e.m., the 
number of independent experiments is indicated in the bar. Sequence QIDT 
(amino acids 350-353) in dynamin 2 corresponds to QVDT in dynamin 3. 
siRNA, short interfering RNA. Raw data for b is available in the Supplementary 
Information. 


F403A were still recruited to clathrin-coated pits; these pits, however, 
remained stable at the membrane surface (Extended Data Fig. 3d). 
Thus, the function of dynamin at clathrin-coated pits, but not its 
recruitment, depends on an intact interface 3. 

The dimers in the dynamin tetramer are asymmetric concerning the 
PH domain and the orientation of the G domain and the BSE (Fig. 1). 
Compared to an outer molecule, G domains of inner molecules are 
tilted by approximately 40° around hinge 1 between the BSE and the 
stalk (Extended Data Fig. 4a). The PH domains of the outer molecules 
bind to a conserved surface of the stalk (Fig. 3a, b), a similar site as in 
dimeric dynamin 1 (ref. 2) (Extended Data Fig. 4b). The assignment of 
the visible PH domains to the outer molecules is unambiguous 
(Extended Data Fig. 4c). The PH domains of the inner molecules were 
not resolved in the electron density (Extended Data Fig. 4d). Modelling 
of the inner PH domains to positions equivalent to those observed for 
the outer molecules leads to clashes with the outer stalks (Fig. 3c and 
Extended Data Fig. 4e). Apparently, the PH domains have to be 
released from their autoinhibitory site for oligomerization to proceed. 
In keeping with this assumption, a dynamin 3 variant lacking the PH 
domain assembled in the absence of membranes and the presence of 
nucleotides more efficiently into regular oligomers than did wild-type 
dynamin 3 (Extended Data Fig. 5a—c). Dynamin 3 tubulated liposomes 
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Figure 3 | Coupling of autoinhibition and oligomerization. a, Stalks and PH 
domains in the dynamin tetramer as seen in the crystal. The box defines the 
view displayed in c. b, Close-up view of the PH-domain-stalk interface from 
a. Mutations in dynamin 2 implicated in centronuclear myopathy are indicated 
as pink balls; K361 and E355 as purple and black balls, respectively. c, Markov 
state models were constructed from MD simulation data including the stalk 
and PH domain. For each metastable PH domain conformation, three 


on its own and did not need a specific membrane curvature for bind- 
ing. At physiological salt concentrations, dynamin 3 efficiently bound 
to and tubulated unfiltered Folch liposomes (Extended Data Fig. 5d, e). 
This is in line with the presence of a tyrosine in position 596, which has 
been suggested to serve as a determinant for curvature generation 
versus curvature sensing'*. When expressed in mammalian cells, dyna- 
min 2(APH) formed large, presumably cytosolic aggregates that failed 
to co-localize with clathrin and interfered with transferrin uptake in a 
dominant-negative fashion (Extended Data Fig. 5f, g), as has been 
shown previously for dynamin 1 (ref. 19). These results indicate that 
the PH domain has important functions in oligomerization and mem- 
brane binding. 

To investigate this dual function of the PH domain, we inserted the 
mutations R364S, R518H, R518D and E355A into the interface 
between the PH domain and stalk (Extended Data Fig. 6). Similar to 
nearby interface-3 mutants, most of these mutations impeded assem- 
bly, liposome binding, liposome-stimulated GTPase activity and 
transferrin uptake. In contrast, the mutation R518D enhanced oligo- 
merization and GTP hydrolysis, as previously described for mutants in 
the PH-domain-stalk interface*”®. 

Molecular dynamics (MD) simulations were carried out and ana- 
lysed by Markov models*** to characterize the dynamics of the PH- 
domain-stalk interface and its interplay with interface 3 (Fig. 3 and 
Extended Data Fig. 7). The simulations showed that E355 and K361, 
together with R518 and R364, are part of a network of polar inter- 
actions (Extended Data Fig. 7a) that can rapidly interconvert leading 
to three distinct binding modes (Fig. 3c). The preferred binding 
interaction of the PH domain with the stalk was the autoinhibitory 
‘closed’ conformation found in our crystals. In two other ‘open’ con- 
formations, the PH domain was shifted along the stalk to a position 
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representative structures are shown. Clashes between the PH domain and 
assembling stalk (light-blue surface) are indicated by black ovals. Percentages 
indicate the occurrence of each metastable state for wild-type dynamin 3 
(black) and the mutant K361S (red). Open/closed: position of the PH domain 
allows/inhibits oligomerization. d, Example trajectories. The conformational 
dynamics are projected onto the Glu368—Arg618 distance, which illustrates 
opening and closure of the PH domain at the autoinhibited site. 


where it did not interfere with oligomerization, indicating a dynamic 
equilibrium of oligomerization-permissive and non-permissive bind- 
ing modes. The mutation K361S resulted in the appearance of a 
fourth, highly populated conformation that was also autoinhibitory 
for oligomerization, whereas no oligomerization-permissive binding 
modes were detected (Fig. 3c, d). Further MD simulations and 
Markov models of the stalk with a dissociated PH domain indicated 
that the mutant K361S predominantly stabilizes the loop LIN® in a 
conformation that is not adopted in the wild type (Extended Data 
Fig. 7c). Together, these results may explain the reduced oligomeriza- 
tion capacity of K361S, which is dimeric in solution (Extended Data 
Fig. 6c). A set of highly conserved charged residues including K361 
apparently regulates both the autoinhibitory interaction with the PH 
domain and interactions with LIN‘, thereby tightly coupling autoin- 
hibition and oligomerization. 

Comparison of the dynamin tetramer with the filament-like arrange- 
ments observed in the crystal structures of dynamin 1 (refs 2, 3) shows 
that the tetramer is bent, such that the angle between the outer stalks is 
changed by 20° (Figs 1 and 4a). We constructed a dynamin oligomer by 
stepwise addition of tetramers to the free ends of the growing dynamin 
assembly, using the geometry of interface 3 to connect the tetramers. 
This led to a right-handed helix (Fig. 4b), closely matching the dimen- 
sions of the dynamin 1 helix in the non-constricted, nucleotide-free 
state’. These observations indicate that formation of a right-handed 
dynamin helix at the surface of a tubular membrane is an intrinsic 
feature of stalk assembly via interface 3. The bent shape of the tetramer 
appears to dictate the curvature of a membrane tubule around which 
dynamin preferentially oligomerizes**. Constriction of membrane 
tubules to inner diameters smaller than 16 nm requires active GTP 
turnover and the associated G domain interactions across helical turns. 
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Figure 4 | Assembly of the stalks leads to a right-handed dynamin helix. 

a, Bent architecture of the dynamin 3 tetramer. Only the stalk helices are shown 
as cylinders, first dimer in light blue, second dimer in dark blue. b, Assembly 
of dynamin 3 tetramers using the geometry of interface 3 leads to a right- 
handed helix which fits an EM map of the non-constricted dynamin 1 helix” 
(shown in mesh representation). For clarity, only the stalks are displayed. In the 
inset, the G domain and BSE of an inner molecule and the PH domain of 
the adjacent outer molecule are shown in surface representation. 


Comparison of our structure with a recent cryo-EM model of a super- 
constricted dynamin helix”® (Extended Data Fig. 2c) suggests that con- 
striction of the dynamin helix is driven by conformational changes in 
the stalk interfaces. 

The stalks of our helix model fit well into a cryo-EM map of nuc- 
leotide-free dynamin 1 assembled around a lipid tubule** (Fig. 4b), but 
the PH domains and the G domains protrude from the electron density 
(Fig. 4b inset and Extended Data Fig. 8a). Apparently, the PH domains 
are shed from the autoinhibitory stalk interface to bind the membrane 
tubule, whereas the G domains move upwards from their positions. To 
explain the assembly of dynamin, we propose an equilibrium between 
PH domains bound (as seen for the outer molecules) and unbound (as 
for the inner molecules) to their stalks. In the cytosol, this equilibrium 
lies to the autoinhibited tetramer to prevent untimely oligomerization. 
Centronuclear-myopathy-related mutations in the interface between 
the stalk and PH domain (Fig. 3b and Extended Data Fig. 9) shift the 
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equilibrium towards the oligomerized state, thereby leading to disease. 
Upon dynamin recruitment by accessory proteins to endocytic sites, 
the equilibrium is driven towards the assembly-competent conforma- 
tion. This hypothesis is supported by studies showing that, in vivo, 
dynamin helices are built by incorporation of dimer or tetramer units 
rather than larger preformed dynamin assemblies””*. Further inter- 
actions, which may influence the assembly equilibrium, occur between 
the BSE and stalk, or the G domain and PH domain of adjacent dimers 
(Extended Data Fig. 8b-d). In this view, the effect of the disease-rel- 
evant mutation R465W may be explained. In the tetramer, R465 of an 
outer molecule is in close vicinity to the inner BSE of an adjacent dimer 
and a tryptophan at this position is likely to modify this interaction 
resulting in enhanced oligomerization. 

A striking feature of dynamin assembly is the multitude of interac- 
tions in all four molecules of the tetramer (Extended Data Fig. 3a). Our 
results indicate that these contacts are not necessarily static, but are 
characterized by a dynamic equilibrium of different binding confor- 
mations. The formation of new interactions during assembly is com- 
pensated for by the release of autoinhibitory contacts in the dynamin 
tetramer. Such an assembly mode that involves many low-affinity 
interaction sites facilitates reversibility and allows regulation, for 
example through nucleotide binding, hydrolysis or phosphorylation”. 
It is the basis for the particular interaction mode of the semi-solid 
dynamin polymer with its protein and membrane environment, which 
has been previously identified in other CME proteins and has been 
coined as ‘matricity*®. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Protein expression and purification. Human dynamin 3 (splice form abb*', 
residues 1-754) and indicated mutants of this construct were expressed from 
the pProEx-HTb vector (Invitrogen) as N-terminal Hisg-tag fusion followed by 
a tobacco etch virus (TEV) cleavage site. The crystallized construct contained 
the K361S mutation. Proteins were produced in Escherichia coli host strain 
BL21(DE3), and expression was induced by addition of 0.1 mM isopropyl-B-p- 
thiogalactopyranoside. Cells were grown over night at 20 °C in terrific broth med- 
ium. The following procedure was used for purification of dynamin 3(K361S) for 
crystallization. Cells were resuspended in buffer A300 (50 mM HEPES-NaOH 
(pH 7.5), 300mM NaCl, 15mM imidazole and 2mM MgCl.) including 1mM 
phenylmethylsulfonyl fluoride and 0.1% v/v NP-40, and disrupted by sonification. 
Cleared lysates (30,000g, 1h, 4 °C) were applied to a Ni?*-NTA column (Qiagen). 
The column was sequentially washed with buffer A300 and with buffer 
A100 (100mM NaC)). Protein was eluted with buffer A100 containing an addi- 
tional 285 mM imidazole. Fractions containing human dynamin 3 were pooled 
and diluted with an equal volume of 50mM HEPES-NaOH (pH 7.5). The 
diluted protein was loaded onto a HiLoad SuperQ anion exchange column (GE 
Healthcare) equilibrated with buffer B50 (in which 50 refers to the NaCl concen- 
tration) containing 50 mM HEPES-NaOH (pH 7.5), 50 mM NaCl, 2mM DTE and 
1mM MgCl,. After washing with buffer B50, bound proteins were eluted with a 
linear gradient from 50 to 500 mM NaCl. Fractions containing human dynamin 3 
were pooled, 1 mg TEV per 10 mg dynamin 3 was added, and the protein incubated 
onice for 4h. The solution was concentrated using 50 kDa molecular weight cut-off 
concentrators (Amicon) and applied onto a Superdex 200 gel filtration column (GE 
Healthcare) equilibrated with buffer B100. Fractions containing dynamin 3 were 
pooled, concentrated and flash-frozen in liquid nitrogen. 

Wild-type and mutant dynamin 3 used for biochemical and biophysical 

assays were expressed in E. coli Rosetta2-BL21-DE3 in autoinduction medium 
(Novagen) and purified using a Co”* -Talon column, followed by overnight TEV 
cleavage (4 °C, 30 1g per 1 mg fusion protein), dilution/concentration in concen- 
trators for imidazole removal and a second Co”* -Talon column run for Hisg-TEV 
and uncleaved Hisg-dynamin capture. Finally, the peak fractions from a Superdex 
200 gel filtration containing dynamin were pooled, concentrated to maximal 
20 mg ml“! and flash frozen in liquid nitrogen. The purification buffer contained 
20mM HEPES-NaOH, pH 7.8, 500 mM NaCl, and 2mM MgCl, (plus 100 mM 
imidazole for elution, plus 2.8 mM {-mercaptoethanol during TEV cleavage). The 
purified protein was nucleotide-free, as confirmed by high-performance liquid 
chromatography analysis (see below for details). 
Crystallization and structure determination. Crystallization trials by the sitting- 
drop vapour-diffusion method were performed at 4°C using a mosquito LCP 
pipetting robot (TTP Labtech) and Rock Imager storage system (Formulatrix). 
Human dynamin 3 in 150 nl volumes at a concentration of 20 mg ml’ was mixed 
with an equal volume of reservoir solution from commercially available prefor- 
mulated screens (Qiagen). On a preparative scale, 2 ul of protein solution was 
mixed with 2 pl of reservoir solution containing 100 mM MES-NaOH (pH 6.5) 
and 15% 2-methyl-2,4-pentanediol. Crystals appeared after three to five days and 
reached final dimensions of up to 0.5 mm X 0.3 mm X 0.3 mm. Crystals were 
cryoprotected by immersion in reservoir solution added with increasing amounts 
of ethylene glycol with the final solution containing 17% v/v ethylene glycol. 
Cryoprotected crystals were flash-cooled in liquid nitrogen. Data were recorded 
at beamline PXI-X06SA at the Swiss Light Source (Villigen, Switzerland). Native 
data from a single crystal was processed and scaled using the program package 
XDS*. The structure was solved by molecular replacement with Phaser** using the 
structure of the nucleotide-free rat dynamin 1 G domain (2AKA), the stalk of 
human dynamin (3SNH) and the human PH domain (1DYN) as search models. 
The model was built using Coot™ and iteratively refined using Phenix® with 
noncrystallographic symmetry (NCS) between the outer and the inner molecules, 
respectively, with reference model restraints against an artificial dynamin con- 
struct composed of the high resolution search model domains, and with one 
translation-libration-screw (TLS) group per domain. 

Owing to weak electron density, all residues of the G domains of the inner 
molecules were chopped at the CB atoms, and the whole domains were refined 
as rigid bodies. In the final model, the outer molecules have disordered regions in 
the LINS loop, the L1” and L2°” loops and the L5? loop, and the inner molecules 
in the hinge 1 region and the L2° loop. Furthermore, the complete PH domains of 
the inner molecules are not resolved in the electron density. The structure was 
refined to Rwork/Rfree Of 23.2%/27.8%. Of all residues, 94.5% are in the most 
favoured regions of the Ramachandran plot and 0.6% (15 out of 2,500) of residues 
are in the disallowed regions, as analysed with Molprobity*’. Figures were prepared 
with PYMOL*’”. Domain superpositions were performed with Isqkab**. Sequences 
were aligned using Clustal W” and adjusted by hand. The model of the 


LETTER 


right-handed dynamin 3 helix was fitted manually into the EM map using 
PyMOL” and Chimera”. 

Analytical ultracentrifugation. Sedimentation velocity experiments were carried 
out in a ProteomeLab XL-I analytical ultracentrifuge (Beckmann Coulter) at 
35,000 r.p.m. and 20°C using an An-50 Ti rotor. Concentration profiles were 
measured using the manufacturer’s data acquisition software ProteomeLab XL-I 
Version 6.0 (Firmware 5.7) with the absorption scanning optics at 280 nm. 
Sedimentation velocity analysis was performed in a buffer containing 
0.15 M NaCl, 50mM HEPES-NaOH (pH 7.5) in 3 or 12mm standard double 
sector centrepieces filled with a 100 ul or 400 pl sample, respectively. For data 
analysis, a model for diffusion-deconvoluted differential sedimentation coefficient 
distributions (continuous c(s) distributions) implemented in the program 
SEDFIT“' was used. For proteins sedimenting as a single species, molecular masses 
were obtained from c(s) analysis as calculated from the s-value and diffusion 
broadening of the sedimenting boundary. Dynamin 3(K361S/R399A/APRD) 
mutant, analysed in a concentration range from 4 to 23 LM, showed a single peak 
in c(s) distributions with a sedimentation coefficient slightly decreasing with 
increasing protein concentration (data not shown). Owing to hydrodynamic 
non- ideality, this is expected for a protein that does not change its oligomerization 
state with concentration”’. Extrapolation to zero concentration yielded so, = 6.4 
S and a molecular mass of 160 kDa was obtained from c(s) analysis. Since the 
molecular mass of the monomer as calculated from the amino acid composition is 
86 kDa, this mutant forms dimers in solution. For comparison, all other mutants 
were analysed at a concentration of about 20 11M. The following molecular masses 
were obtained from c(s) analyses of mutants that sedimented as a single species: 
Q350A/V351A/D352A/T353A, 162 kDa; L354A/E355A/L356A/S357A, 164 kDa; 
G358R, 167 kDa. 

Partial specific volume, buffer density and viscosity were calculated from amino 
acid and buffer composition, respectively, by the program SEDNTERP* and were 
used to correct experimental s-values to so. Figures were prepared using the 
program GUSSI (http://biophysics.swmed.edu/MBR/software.html, provided by 
C. Brautigam). 

Liposome co-sedimentation assays. Liposomes were prepared as previously 
described (http://www.endocytosis.org). Folch liposomes (total bovine brain lipids 
fraction I from Sigma) in 20 mM HEPES-NaOH (pH 7.5), 100 mM NaCl were 
extruded 13 times through a 0.1 1m filter. The resulting 0.2 mg ml‘ liposomes 
were incubated at room temperature with 4.0 1M of the indicated dynamin 3 
construct for 10 min in 40 ul reaction volume, followed by a 213,000g spin for 
10 min at 20 °C. The final reaction buffer contained 25 mM HEPES-NaOH pH 7.5, 
140 mM NaCl, 2mM MgCl, and 1 mM KCl. 

GTP hydrolysis assay. GTPase activities of 1 11M of the indicated dynamin con- 
structs were determined at 37°C in 25mM HEPES-NaOH (pH 7.5), 130mM 
NaCl, 2mM MgCl, and 1 mM KCl, in the absence and presence of 0.1 mg ml! 
0.1-jum filtered Folch liposomes, using saturating concentrations of GTP as sub- 
strate (1 mM for the basal and 3 mM for the stimulated reactions). Reactions were 
initiated by the addition of protein to the reaction. At different time points, 
reaction aliquots were 15-fold diluted and quickly transferred to liquid nitrogen. 
Nucleotides in the samples were separated via a reversed-phase Hypersil ODS-2 
C18 column (250 X 4mm), with 100 mM potassium phosphate buffer (pH 6.5), 
10 mM tetrabutylammonium bromide and 7.5% acetonitrile as running buffer. 
Denatured proteins were adsorbed on a C18 guard column. Nucleotides were 
detected by absorption at 254 nm and quantified by integration of the correspond- 
ing peaks. Rates were derived from a linear fit to the initial reaction (<20% GTP 
hydrolysed). 

Electron microscopy. For electron microscopic studies (Zeiss EM910), 2 uM 
dynamin 3 in 25mM HEPES-NaOH (pH 7.5), 60 or 150mM NaCl, 2mM 
MgCl, 1mM KCl and 1 mM guanosine-5’ -[(B,7)-methyleno]triphosphate were 
incubated at room temperature for 20h without liposomes or for 20 min with 
liposomes. The final concentration of unfiltered liposomes was 0.35mgml '. 
Samples were spotted on carbon-coated copper grids (Plano GmbH) and nega- 
tively stained with 3% uranyl acetate. 

Transferrin uptake in HeLa cells. HeLa cells (ATCC no. CCL2, identified by their 
short tandem repeat and isoenzyme profiles) were obtained from ATCC 
(American Type Culture Collection) and tested for mycoplasma contamination 
on a routine basis. Cells were cultured for 20-25 passages without further authen- 
tication before starting a fresh culture from the immediately ATCC-derived stock. 
HeLa cells were transfected with siRNA using oligofectamine (Invitrogen). The 
sequence of the siRNA used to target human dynamin 2 was 5’-GCAACUGA 
CCAACCACAUC-3’ (nucleotides 849-867). After 24h, cells were transfected 
with pEGFP-N1 (Clontech) or siRNA-resistant rat dynamin 2-pEGFP-N1 using 
lipofectamine 2000 (Invitrogen). 72 h after siRNA-transfection, cells were serum- 
starved for 1h and incubated with 15 jg ml’ transferrin—-Alexa 647 (Molecular 
Probes, Invitrogen) for 10 min at 37 °C. On ice, cells were washed once with cold 
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PBS and 10mM MgCl, and once for 90s with 0.1 M acetic acid (pH 5.3) and 
200 mM NaCl to remove surface-bound transferrin. After two washes with cold 
PBS and 10 mM MgCl, cells were detached from the culture dish by incubating for 
5 min on ice with 0.1% Pronase E solution in PBS and 0.5 mM EDTA. Cells were 
resuspended in 1% bovine serum albumin (BSA) in PBS, pelleted at 300g for 5 min 
at 4°C and then fixed in 4% paraformaldehyde, 4% sucrose in PBS for 15 min on 
ice and a further 20 min at room temperature. Cells were pelleted and resuspended 
in 1% BSA in PBS and analysed by flow cytometry using a BD FACScalibur. 
Transferrin fluorescence in GFP-positive cells was quantified and normalized to 
cells rescued with wild-type dynamin 2-eGFP. No statistical methods were used to 
predetermine sample size. 
Localization of dynamin 2-eGFP mutants and analysis of clathrin-coated pit 
dynamics. HeLa cells depleted of endogenous dynamin 2 as described above were 
co-transfected with plasmids encoding eGFP or dynamin 2-eGFP and mRFP- 
clathrin-light-chain. 72h after siRNA transfection, cells were analysed by total 
internal reflection fluorescence (TIRF) microscopy using a Nikon Eclipse Ti 
(Andor sCMOS camera, Okolab incubator, Nikon PerfectFocus autofocus system, 
60X TIRF-objective, operated by open source ImageJ-based Micromanager soft- 
ware“). For live imaging, cells growing on glass coverslips were kept in Hank’s 
balanced salt solution with 5% fetal bovine serum. From 180s dual-colour TIRF 
recordings with a frame rate of 0.5 Hz, kymographs were created by selecting a line 
of pixels from an individual cell and depicting this line over the duration of 90 
frames. 
Molecular dynamics simulation and modelling. Molecular dynamics (MD) 
simulations of the stalk and the pleckstrin homology (PH) domain (residues 
322 to 710) were carried out for the wild type and the K361S mutant, each using 
three different setups: (1) the crystal structure coordinates of chain C superim- 
posed to chain B were taken as starting point; (2) starting from setup 1, the PH 
domain was moved 5A away from the stalk; and (3) the PH domain was absent. 
Setups 1 and 2 were used in order to study the conformational equilibrium of 
stalk-PH domain interactions. The loops joining the stalk to the PH domain 
(residues 495 to 511 and 628 to 640) were generated for each setup and mutant 
using VMD (visual molecular dynamics)** and were minimized and equilibrated 
separately. The aim of the setup 3 was to study the intrinsic conformational 
dynamics of the LIN® loop when the PH domain is dissociated from the stalk. 
The coordinates of each setup and for each mutant were used to construct an all- 
atom molecular model and run MD simulations in explicit solvent with 
GROMACS* using the CHARMM27 force field*”. For the setup and equilibration 
procedure, hydrogen atoms were added based on the heavy atom coordinates 
followed by an initial energy minimization. The protein was then solvated in a 
water box with a solvation layer of at least 10 A, resulting in an overall system of 
between 70,000 and 80,000 atoms (depending on the initial structure). Na‘ and 
Cl ions (100 mM) were added to buffer the system and obtain an overall neutral 
simulation cell. The solvated and ionized system was again minimized and equili- 
brated in the NVT (canonical) ensemble at 300 K with position constrains on the 
protein heavy atoms. A second equilibration was carried out in the NPT (iso- 
thermal-isobaric) ensemble, again with position constrains, followed by a 1 ns 
equilibration without constrains. The equilibrated coordinates and velocities were 
used as the starting point for twenty 100-ns production runs for each setup and 
with both wild type and mutant K3615, giving rise to a total of 12 microseconds of 
molecular dynamics data. 
Analysis with Markov state models. The conformations of the LIN® loop and the 
stalk-PH domain patterns can be well characterized by their hydrogen bonding 
patterns within the loop or between stalk and PH domains. Here, 21 residue pairs, 
shown in Extended Data Fig. 7d, were selected that can form hydrogen bonds or 
salt bridges. The Co distances of these residue pairs were evaluated in order to 
obtain a low-dimensional representation of the respective configuration. These 
distances were used to build Markov state models*!*“*“ of setup 3 (LIN® loop) 


and using both setups 1 and 2 for the PH-domain-stalk interactions using the 
EMMA program (http://pyemma.org)°°. The microstates of the Markov state 
models were obtained by regular spatial clustering in the distance space**. The 
distance cutoff for the regular spatial clustering was chosen to obtain around 500 
microstates. Using a cutoff of 10.75 A for the wild type and a cutoff of 9 A for the 
mutant K361S resulted in 550 and 584 microstates, respectively. The lag-time- 
dependent relaxation timescales, indicating approximate Markovianity*! at lag 
times of 20 ns or larger, are shown in Extended Data Fig. 7. Reversible transition 
matrices were then estimated at a lag time of 20ns. The microstates of each 
Markov state model were clustered into a set of three to four metastable states 
using the robust Perron cluster analysis*’. At this resolution, the metastable states 
sampled by the different mutants can be clearly associated between wild type and 
mutant, as shown in Fig. 3c. The Markov model was used to generate random 
trajectories shown in Fig. 3d as described in ref. 23. 
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Extended Data Figure 1 | Characterization of the dynamin 3 construct. whole-cell-lysate non-induced; SN, supernatant of cleared lysate; E, elution 
a, Top: the domain structure of dynamin 3. The previously used sequence- peak of the Talon-Co** column; CL, after cleavage with TEV protease; P, pool 
derived domain nomenclature is shown below. Bottom: a dynamin 3 monomer _ after gel filtration. c, Representative electron density map (stereo view). Two 
colour-coded according to the domain architecture. b, SDS-PAGE stalk helices are shown as stick models, the 2F, — F, map is contoured at 1.0c. 
representing a typical purification of dynamin 3. M, marker proteins; NI, Raw data for b is available in Supplementary Information. 
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Extended Data Figure 2 | Dimerization of dynamin 3. a, Superposition of 
dynamin 1 (grey; PDB code: 3SNH) and dynamin 3 (magenta and green) 
dimers, colour-coded as in Fig. 1. The stalk arrangement in dynamin 3 is 
essentially the same as in dynamin 1. b, Interface 2 in dynamin 3. The view is 
rotated by 90° with respect to a. The zoom shows the side chains of residues 
involved in interface formation. Residues, whose mutation render dynamin 1 
and dynamin 3 monomeric, are marked with an asterisk. c, Top: stalks of the 
dynamin 3 tetramer, as seen in the crystal structure (left). Dynamin dimers 
(dark and light blue) are formed via interface 2 (12) and assemble into the 
tetramer via interfaces 1 and 3 (I1 and J3, respectively). In alternative 
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dimerization models (middle and right)°’, dynamin monomers assemble via 
interface 1 (middle) or interface 3 (right) to form elongated dimers of different 
shapes. Bottom: arrangement of stalks of a dynamin 1 as fitted into a cryo-EM 
density map of a super-constricted dynamin 1 helix (PDB code: 4UUD)”*. 

d, Oligomeric state of dimer interface mutants, as assayed by analytical 
ultracentrifugation at a protein concentration of 20 UM. The following 
molecular masses were obtained from c(s) analyses: dynamin 1(R361S/R399A) 
(176 kDa, dimeric) in dark blue; dynamin 1(1481D/H687D/L688S) (84 kDa, 
monomeric) in light blue; dynamin 3(K361S/R399A) (165 kDa, dimeric) in red, 
dynamin 3(1481D/H677D/L678S) (83 kDa, monomeric) in black. 
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Extended Data Figure 3 | Dynamin assembly via interface 3. a, Schematic 
overview of the interactions in interface 3. b, Details of loop LIN®. The 2F, — F- 
electron density is contoured at 1.00. c, The Charcot-Marie-Tooth-related 
mutation G358R is located at the C-terminal end of loop L1N®. It probably 
disturbs the structural integrity of this loop and therefore might interfere with 
oligomerization. d, Clathrin-coated pit dynamics in HeLa cells expressing 
interface 3 mutants of dynamin 2. HeLa cells treated with dynamin 2 siRNA 
were co-transfected with plasmids encoding eGFP or siRNA-resistant dynamin 
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2-eGFP and mRFP-clathrin-light-chain, and live cells were imaged at 37 °C by 
TIRF microscopy. Shown are representative time-resolved line scans 
(kymographs) from at least ten time-lapse recordings of individual cells. 
Attenuated clathrin-coated pit dynamics upon depletion of endogenous 
dynamin 2 are only rescued by re-expression of wild-type but not mutant 
dynamin 2-eGFP. Note that the dynamin 2 mutants tested displayed a more 
diffuse subcellular distribution although they were still recruited to clathrin- 
coated pits. 
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Extended Data Figure 4 | Localization of the PH domain in the tetramer. 
a, Superposition of an outer dynamin molecule (magenta) and an inner 
molecule (green) of the dynamin 3 tetramer. The comparison reveals an ~40° 
rotation of the G domains and BSEs. Furthermore, the PH domain is visible 
only in the outer molecule. b, Superposition of the stalk and PH domain in 
dynamin 1 (grey) and dynamin 3 (magenta). c, Connectivity of PH domain 
and stalk in the outer molecule. Shown are the stalk and PH domain of an outer 
molecule (magenta) and the stalk of the corresponding inner molecule (green) 
from a dimer. Since the gap of ~58 A between V629 of the PH domain and 
P643 of the inner stalk is too large to be spanned by the missing 13 residues 
(grey dashed line), we can unambiguously assign the PH domains in dynamin 
3 to the outer stalks (black dashed lines). All other potential connections 
including molecules from the second dimer or symmetry-related tetramers 
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span even larger distances (not shown). In the crystal structures of dynamin 1, 
an unequivocal assignment of the PH domain to a specific stalk was not 
possible, due to the long unresolved linker regions between the stalk and the PH 
domains. Concomitantly, the impact of the interface between stalk and PH 
domain has not been generally recognized*’. d, The outer PH domains are 
clearly defined in the electron density (left panel), whereas no density for a PH 
domain is observed in the equivalent position at the inner stalks (right panel). 
The density visible in the right panel corresponds mainly to a G domain 
from a symmetry-related molecule. The 2F, — F, electron density is contoured 
at 1.00. e, Modelling of a PH domain (grey) relative to an inner stalk (green) 
in the same geometry as seen in the outer molecules leads to steric clashes 
(black oval) with an adjacent stalk (blue). 
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Extended Data Figure 5 | The PH domains regulate oligomerization of 
dynamin. a, In the absence of liposomes, a dynamin 3 variant lacking the PH 
domain (APH) was sedimented more efficiently than wild-type dynamin 3 
(WT). Both APH and wild type lacked the PRD. The proteins were sedimented 
by ultracentrifugation after 20 h of incubation at low salt concentrations 

(60 mM NaCl) in the presence of the non-hydrolysable GTP analogue 
GMPPCP. S, supernatant; P, pellet fraction. b, c, Representative negative-stain 
electron micrographs of wild type (b) and APH (c) under the same conditions 
as in a. For each protein, at least eight micrographs were recorded. Both 
constructs showed oligomeric ring structures, similar to structures seen for 
full-length dynamin’. Our data indicate that oligomerization of dynamin 
does not require membrane binding, but membrane binding requires 
oligomerization (Fig. 2). d, In liposome co-sedimentation assays, dynamin 3 
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bound to Folch liposomes independently of their size. Not extr., not extruded. 
e, At physiological salt concentrations (150 mM NaCl), dynamin 3 efficiently 
tubulated unfiltered Folch liposomes. In contrast, APH did not decorate the 
liposome surface and did not induce liposome tubulation. For each setup at 
least 12 micrographs were recorded. f, When expressed in HeLa cells, dynamin 
2(APH) formed large cytosolic aggregates that did not co-localize with mRFP- 
clathrin. Arrowheads indicate co-localization for wild-type dynamin 2. 
Shown are magnified insets of representative images from at least 20 individual 
cells, acquired by TIRF microscopy. g, Dynamin 2(APH) was dominant- 
negative in transferrin uptake assays. Data shown represent mean ~ s.e.m., the 
number of independent experiments is indicated in the bar. Raw data for a and 
d is available in Supplementary Information. 
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Extended Data Figure 6 | Mutational analysis of the interface between PH 
domain and stalk. a, Analytical gel filtration analysis for wild-type dynamin 3 
and the mutant R518D. The proteins were pre-incubated for 10 min at 22 °C or 
37 °C. When pre-incubated at 37 °C, only R518D showed a higher molecular 
weight species. AU, arbitrary units. b, Intrinsic GTPase activity of wild-type 
dynamin 3 and the mutant R518D at 37 °C in the absence of liposomes. The 
lines represent linear fits of GTP hydrolysis versus time. For R518D, a biphasic 
behaviour of the GTPase activity was apparent (for wild type: k,,, = 0.5 min~ I, 
for R518D: kgps1 = 2.2 min and kj, = 13.3 min '). This biochemical 
behaviour is reminiscent of dynamin 1 mutants in the PH-domain-stalk 
interface that show increased oligomerization and GTPase rates when 
incubated at 37 °C”. Perturbations in this interface appear to promote 
oligomerization of dynamin, pointing to an autoinhibitory function of this 
interface for oligomerization. The average of two independent measurements is 
shown with deviations ranging from 0% to 0.05% for wild type and 0% to 0.62% 
for R518D. ¢, Analytical ultracentrifugation experiments for the indicated 
dynamin 3 variants, as in Fig. 2a. For the mutant K361S that sediments as a 
single species, a molecular mass of 164 kDa could be obtained from c(s) analysis, 
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indicating that this mutant forms dimers in solution. d, Liposome co- 
sedimentation analysis for the indicated mutants. S, supernatant; P, pellet 
fraction. e, GTPase activity of the indicated mutants in the absence and 
presence of liposomes. Shown is the average of two independent measurements, 
with deviations ranging from 1% to 11%. f, Ability of dynamin 2 mutants to 
rescue defective CME of transferrin in absence of endogenous dynamin 2. The 
assay was performed as described in Fig. 2d. R518 in dynamin 3 corresponds to 
R522 in dynamin 2 and the R522H mutation in dynamin 2 is implicated in 
centronuclear myopathy. Data shown represent mean + s.e.m., the number of 
independent experiments is indicated in the bar. Note, we generally observed 
that the GTPase experiments were the most sensitive indicators of structural 
perturbations induced by mutations. Compared to membrane binding assays, 
GTPase assays appear to be more sensitive to the actual architecture of the 
dynamin oligomer and alterations induced by point mutations. Transferrin 
uptake assays could be influenced by cellular factors, such as BAR-domain 
protein that may stabilize mutant dynamin forms with deficits in 
oligomerization. Raw data for d is available in Supplementary Information. 
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Extended Data Figure 7 | Molecular dynamics simulations and Markov indicating approximate Markovianity. The grey area indicates the region with 
models. a, The PH-domain-stalk interaction is characterized by a number lag times larger than relaxation timescales. c, Top: intrinsic conformation 
of mainly polar interactions. The represented conformation is one of the dynamics of the L1N® loop shown for the wild type (black) and the mutant 
starting structures (setup 2) for the MD simulations and quickly converts K361S (red). Bottom: six metastable conformations and their equilibrium 
into one of the metastable conformations shown in Fig. 3. b, Relaxation probabilities of the L1N® loop (setup 3) for the wild type (black) and mutant 
timescales of different constructs as a function of lag time computed from K361S (red) computed from the Markov model. d, Residue pairs used to 


Markov models. The timescales of all models (black) have converged ata lag characterize the L1N® loop and stalk-PH domain interactions. 
time of about 20 ns within statistical uncertainty (colour-shaded regions), 
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Extended Data Figure 8 | Interactions of the G domain, stalk and BSE in 
the tetramer. a, Two views on a fitting of the dynamin 3 tetramer crystal 
structure into the EM density of non-constricted oligomerized dynamin 1 
(ref 24). The positions of the inner G domains are shown in all four molecules 
since the outer G domains in our crystals are stabilized by crystal contacts. 
Apparently, membrane binding and oligomerization is associated with 

major movements of the G domain, BSE and the PH domain (indicated by 
arrows). b, A loop of the outer PH domain and an inner G domain are in 
close proximity. c, The outer G domains (left), but not the inner G domains 


(right), are well defined in the electron density. The 2F, — F, electron density is 
contoured at 1.00. The weak electron density for the inner G domains and 
the resulting uncertainty in determining the contact sites prevented us from 
analysing this interaction in more detail. d, The BSE of an inner monomer 
(grey) interacts with the stalk of an outer monomer (magenta). This contact 
involves R465 which is mutated to tryptophan in some centronuclear 
myopathy patients. The R465W mutation leads to hyperactive dynamin‘ that 
fragments the T tubule network in mouse-myoblast-derived myotubes and 
Drosophila body wall muscle (Y.-W. Liu, personal communication). 
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Extended Data Figure 9 | Disease-relevant mutations in dynamin. Localizations of mutations leading to Charcot-Marie-Tooth neuropathy (black balls) and 
centronuclear myopathy (pink balls) are plotted onto a dynamin 3 monomer. Colour code as in Fig. 1. 
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Extended Data Table 1 


Data collection and refinement statistics 


Data collection 
Space group 
Cell dimensions 


a, b, c (A) 


Resolution (A)* 
Reym (%)* 

<I/o(l)> * 
Completeness (%)* 


Redundancy 


Refinement 
Resolution (A) 
No. reflections 


Rwork / Riree (%) 


No. of protein atoms 


averaged B-factor protein (A*) 


R.m.s deviations 
bond lengths (A) 


bond angles (°) 


P2,2,2,, 1 tetramer / ASU 


97.70, 98.00, 401.52 


3.70 (3.70-3.80) 


7.0 (130) 


16.6 (1.8) 


99.6 (99.3) 


7.3 


49.47 — 3.7 


42,058 


23.2 / 27.8 


18,654 


212 


0.004 


0.889 


* Data in highest resolution shell are indicated in parenthesis. 
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LILLIE PAQUETTE, MIT SCHOOL OF ENGINEERING 


THE CELL MENAGERIE: 
HUMAN IMMUNE PROFILING 


Cutting-edge tools and analyses are digging deeper than ever before 
to unveil the intricacies of the diverse human immune system. 


Advanced technologies enable researchers at the Massachusetts Institute of Technology to observe individual immune cells attacking tumour cells. 


BY MARISSA FESSENDEN 


accines save lives — but they don't 
\ always work. Take the annual influ- 
enza shot: by some estimates, flu 
vaccines are only 50-70% effective even when 
well matched to the virus strains in broad cir- 
culation. Despite all the research, scientists 
still cannot predict whether a given vaccine 
will work for any specific person. 

Learning to make vaccines that protect more 
people means getting a better handle on the 
immune system — a bewildering militia of 
cells that communicate to detect and destroy 
pathogens. So far, attempts to parse the sys- 
tem’s complexity have involved work on mice, 
rats, rabbits, dogs, non-human primates and 
even lampreys and sea urchins. Yet results do 
not always translate to the one species that 


medicine cares most about. “There has been a 
vast zoo of animal models, but the one animal 
model we haven't yet exploited is us — Homo 
sapiens,’ says Bali Pulendran, an immunologist 
at Emory University in Atlanta, Georgia. 
Now, researchers are tackling the most 
difficult animal to study as never before. 
Advances in technology are helping scientists 
to dive deeper into the inner workings of single 
cells and carry out analysis on greater numbers 
of cells at once. Efforts in data analysis, shar- 
ing and collaboration promise to enable work 
that is too expensive for individual labs. Ulti- 
mately, researchers hope to bring fresh insights 
to the clinic to protect and treat people using the 
power of an individual's own immune defences. 
The human immune system is incredibly 
diverse. Each class of immune cell is actually 
an army of subtypes. The elite forces — the 


lymphocytes, which recognize specific 
pathogens or wayward body cells — consist 
of natural killer (NK) cells, which quickly dis- 
patch infected or cancerous cells, and B and T 
cells, which bear receptors on their surfaces 
designed to recognize specific invaders. But B 
and T cells break down further: there are regula- 
tory T cells, T helper cells, memory B cells, naive 
Bcells and more, each with its own unique role. 
These lymphocytes coordinate in turn with cells 
such as macrophages and monocytes, which are 
further specialized for other functions. 

Diversity manifests between people, too. 
Even identical twins vary in terms of the 
exact molecules and cell profiles that fight off 
disease. From an evolutionary point of view, 
variability ensures that some members of a 
species will survive a deadly disease outbreak 
— but it confounds researchers. 
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PROFILE OF AN IMMUNE CELL 


Complex droplets consisting of water, oil, cells and magnetic beads reveal 
how gene sequences are paired in individual immune cells. 


Rapidly moving 
oil phase 


Cells encased in the 


droplets are lysed, and 
genetic material adheres 
to the beads. 


.. and genetic 
material is prepared 
for sequencing. 


Magnetic 
bead 


A special nozzle 
encases individual 
cells in a core of 
water surrounded by 
oil and carrying tiny 
magnetic beads. 


>» Gender, ethnicity, genetic background 
and disease history all affect a person's immune 
response in unpredictable ways. They influence 
whether a vaccine will work, and whether some- 
one has allergies or an autoimmune disease 
— both resulting from an overactive immune 
system — or whether a person will develop 
cancer, which is caused in part by an inattentive 
system that fails to remove errant cells. 


VACCINES UNVEILED 

Instead of seeing confusion in such diversity, 
researchers such as Pulendran see opportunity. 
With the right combination of sophisticated 
technologies and data analysis, human vari- 
ation can offer a natural experiment in what 
underlies an effective immune response. 

This reasoning led Pulendran and his team 
to some groundbreaking research on why a 
vaccine for yellow fever works so well. Since 
immunization against the sometimes-deadly 
tropical disease began in 1937, only 12 cases 
have been reported among the hundreds of 
millions of people immunized. 

Scientists have long known that the vaccine 
spurs the body to produce T cells that can kill 
cells infected with the yellow-fever virus — but 
they did not know how. In 2009, Pulendran’s 
team published an analysis’ of changes in the 
state, number and types of immune cells in 
the body before and after vaccination. The 
group found that quantities of a protein called 
EIF2AK4 spike in key immune cells (mainly 
dendritic cells, which help T cells to identify 
invaders) just days after vaccination. The higher 
the spike in protein levels, the more anti-yellow- 
fever T cells are later produced. 

The close correlation suggests the existence 
of components that foster strong immune 
responses — at least for the yellow-fever vac- 
cine. Pulendran and his colleagues” have since 
discovered other proteins that predict similarly 
strong responses to vaccines for flu and menin- 
gococcal disease. Now, they are linking these 
types of marker to subpopulations of cells and 
classifying variation across individuals. 

One major reason that immune responses 
vary is the vast collections of receptors on the 


mRNA strand 


Lysed cell 


Beads are 
collected ... 


surfaces of T and B cells, which correspond to 
antibodies that are secreted by the latter cells. 
To produce a near-endless assortment of these 
Y-shaped molecules, lymphocytes shuffle their 
genes as they mature. The myriad receptors and 
antibodies that result enable the immune system 
to recognize many different pathogens. 

Researchers want to sequence genes for these 
receptors to work out what makes a potent 
immune response, and so gain clues for devel- 
oping vaccines and for designing therapies that 
could spur the immune system to fight cancer. 

But because each receptor is made from pro- 
teins encoded by at least two types of separately 
shuffled gene segment, sequences alone are not 
enough. Researchers must also learn how these 
proteins are paired in an individual cell — and 
which combinations show the most promise 
for fighting disease. 

At the University of Texas at Austin, chemical 
engineer George Georgiou has tackled this chal- 
lenge by studying B cells one at a time. He and 
his team’ first encase 


individual cells in Instead of seeing 
complex droplets: confusion in 
saan an athe such diversity, 
core that preserves te researchers 
cell’s genetic material; 

; such as 
an outer oil layer to Pulend 
keep the cells sepa- nie ene 

opportunity. 


rated; and magnetic 
beads that allow 
researchers to manipulate each droplet and so 
capture and extract the genetic material from 
individual cells. These data can reveal the anti- 
body repertoire elicited by various stimuli — 
crucial information for designing vaccines (see 
‘Profile of an immune cell’). 

Georgiou’s group hopes to publish manual- 
like methods so that others can use the 
technique. Investigators who are unfamiliar 
with it should prepare themselves for a steep 
learning curve, he says: “Every method, espe- 
cially new methods from academic labs, has 
some nuances.” But for those who are willing 
to put in the time and elbow grease, precise 
answers about individual immune cells await. 

Another sequencing approach relies on 
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Isolated genetic material is 
sequenced, revealing the 
unique immune receptors 
present in each cell. 


specially formulated beads to tag individual 
cells before DNA analysis. The tags fuse with 
the cells’ genetic material and function as bar- 
codes that can be traced back to the original cell, 
even when cells are analysed in pools. Hedda 
Wardemann, an immunologist at the Max 
Planck Institute for Infection Biology in Berlin, 
has used this strategy to analyse genes encoding 
paired receptor proteins in more than 46,000 B 
cells at a time’. 

Most microfluidics sequencing platforms are 
developed by individual labs, such as Georgiou’s 
or Wardemann’, that have engineering know- 
how. But as the field grows, companies are 
getting into the game. One of the biggest players 
in the microfluidics field is Fluidigm in South 
San Francisco, California. 

This September, the company began to ship 
high-throughput chips, which can be used on 
the company’s C1 microfluidics platform to 
interrogate genomes of 800 individual cells in 
a single 6.5-hour run. Although it has a much 
lower throughput than Georgiou’s droplet 
method (which can process 6 million B cells ina 
day), Fluidigm’s technology requires less exper- 
tise. The company plans to increase throughput 
to nearly 100,000 cells per run in the near future. 


PERTURBED POPULATIONS 
In addition to profiling individual cells (see 
‘Nanoarenas for cell attacks’), researchers 
want to track how cell populations change in 
response to vaccination or infection. To iden- 
tify specific cell types, the scientists rely on 
protein markers studded on the cells’ surfaces. 
For example, two markers dubbed CD4 and 
CD8 both show up on certain types of mem- 
ory T cells — but CD8 is also on NK cells, and 
CD4 is on monocytes and dendritic cells. So, 
to measure only memory T cells, researchers 
may need to screen for three different mark- 
ers. To isolate an even more-specific subset, the 
number of markers must increase. 
Conventionally, researchers have relied 
on a cell-profiling technology called flow 
cytometry, in which coloured, fluorescent 
proteins are attached to specific cell markers 
so that combinations can be easily detected 


SOURCE: REF. 3 
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and the cells scored or sorted. But overlaps in 
colour spectra generally limit analyses to as few 
as a dozen markers. 

The latest iteration of cell-profiling tech- 
nology — mass cytometry — uses rare-earth 
metals instead of fluorescence and can detect 
more than 40 markers. Because mass cytometry 
can identify so many cell types in a single sam- 
ple, more types of experiments can be done. 

Studies in babies, for example, are key to 
understanding the immune system's develop- 
ment. But infants generally cannot tolerate 
blood withdrawals of more than 4—5 millilitres 
— and even simple flow-cytometry experiments 
can require more than 10 ml. Mass cytometry, 
by contrast, can run on less than 4 ml. 

Mark Davis, a molecular immunologist at 
Stanford University in California, used mass 
cytometry to track hundreds of parameters 
— including 72 different immune-cell popu- 
lations — in the blood of 210 twins. His team 
found’ that much of the variation between 
people's immune systems can be attributed to 
environmental factors, rather than to genetic 
ones. Without mass cytometry, this work would 
have been too complex to perform, he says. 

DVS Sciences, now a part of Fluidigm, has 
invented a mass cytometer called the Cy TOF 
for use in cell profiling. The latest version (as 
well as upgrades for the older system) boosts 
sensitivity and sample-processing speed, and 
can run multiple samples at a time. 

But these technologies are expensive. The 
June version of the CyTOF — the ‘Helios’ 
system — starts at roughly US$500,000, not 
counting service contracts. At Stanford, Davis 
and other researchers rely on shared facilities. 


ASSEMBLING THE PIECES 

Although scientists are making progress, 
many tools have been slow to reach the clinic, 
says Padmanee Sharma, a physician-—scientist 
at the University of Texas MD Anderson 
Cancer Center in Houston. Every new clini- 
cal technology needs standards and quality 
assurances, which require extensive testing to 
establish. Clinical trials are only now adopting 
procedures that might help clinicians to track 
their patients’ immune responses and feed in 
to treatment decisions. 

Communication is another bottleneck. 
Information is accumulating rapidly and 
needs to be shared by collaborators as diverse 
as statisticians, clinicians, basic biologists 
and technologists. Coordinating research 
that involves human participants places huge 
demands on logistics, resources and expertise, 
and one major effort to facilitate such work is 
the Human Immunology Project Consortium 
(HIPC) funded by the US National Institutes 
of Health (NIH). The HIPC doles out grants 
to advance methods, and endeavours to extend 
the fruits of researchers labour to all. 

The consortium offers an online data-analysis 
and management platform called Immune- 
Space, which helps researchers to place data 
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Nanoarenas for cell attacks 


In addition to tracking populations of 
immune cells, researchers want to know 
how they interact. 

Christopher Love, an immuno-engineer at 
the Massachusetts Institute of Technology in 
Cambridge, is using microfluidics to probe 
how individual immune cells cooperate with 
each other. His lab engineers devices that he 
describes as “essentially ice-cube trays”: each 


in along-term archive called the Immunology 
Database and Analysis Portal (ImmPORT), also 
funded by the NIH. The HIPC is spearheading 
efforts to standardize procedures for commonly 
performed assays in cytometry as well as alter- 
nate methods of immune profiling, such as 
measuring antibodies in serum samples. 

Another emerging need is for techniques 
for easy cross-analysis of many data types, says 
Steve Kleinstein, a computational immunologist 
at Yale University in New Haven, Connecticut. 
“There’ a lot of subtlety in the data, and it’s very 
easy to pick up a piece of code or tool that some- 
body put out there on the web, run it with your 
data and get a plot that looks interesting — but 
that’s a very dangerous thing to do,’ he says. 

To help solve this problem, Kleinstein and 
his group have developed software called the 
Repertoire Sequencing Toolkit (pRESTO)’°, 
which offers a way to process, annotate and 
correct raw sequencing data from high- 
throughput platforms such as lumina. It also 
allows researchers to run their data in different 
computing environments and then return to 
the pRESTO environment. 

A separate tool, a web portal known as the 
VDJServer, is in beta-testing after launching 
in April. It offers the ability to analyse B- and 
T-cell-receptor data, with the goal of provid- 
ing an intuitive interface for users who have 
not done any programming, says project 
leader Lindsay Cowell, a bioinformatician 
and immunologist at the University of Texas 
Southwestern Medical Center in Dallas. The 
server will incorporate more analysis tools 
into the portal as they become available 
(Kleinstein’s pRESTO is already embedded). 
Moreover, the portal lets researchers share data 
and even tap into the computing power of the 


well in the tray holds sub-nanolitre volumes, 
as opposed to the tens of microlitres held by 
wells in more-conventional plates. 

Using these tiny arenas to watch natural 
killer (NK) cells home in on leukaemia cells, 
the team has discovered’ — unexpectedly 
— that even a single NK cell will attack a cell 
that does not belong. In the past, researchers 
suspected that NK cells coordinated their 
actions through secreted chemical signals, 
but now it seems that such cooperation may 
be necessary only among larger cell groups. 

Love and his team hope to map their 
understanding of interactions at this single- 
cell level to the immune system as a whole, 
and potentially compare healthy individuals 
with those who have cancer. “With these 
technologies, first you ask: can we define 
normal?” he says. “Then you can think about 
heterogeneity in disease.” WF. 


Texas Advanced Computing Center at the 
University of Texas at Austin. 

There is still an acute need for human 
immunology-specific data repositories, nota- 
bly for T- and B-cell-receptor sequencing 
data, says Jamie Scott, a molecular immunol- 
ogist at Simon Fraser University in Burnaby, 
Canada, who is co-leading an effort to share 
such data. 

But perhaps the biggest block is a basic one: 
a dearth of training. Most analysis requires 
some programming skills, says John Tsang, 
head of computational systems biology for the 
Trans-NIH Center for Human Immunology in 
Bethesda, Maryland. For now, most tools are 
limited to the specialist, he says; collaboration 
with those who can understand the program- 
ming is still the best way forward. 

Creating more collaborations should, in 
turn, help to ensure that the tools truly further 
basic knowledge and translate into practical 
applications. “It is very attractive to apply the 
latest gee-whiz ‘omics technology to measure 
things,” says Pulendran. “But I think we need 
to go beyond measuring and accumulation of 
data — to knowledge and to understanding” m 


Marissa Fessenden is a science journalist and 
illustrator based in Bozeman, Montana. 
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WORK ENVIRONMENT 


When labs go bad 


A toxic relationship between junior scientist and adviser 
can quickly turn career prospects sour. 


BY CHRIS WOOLSTON 


here is no crying in baseball, according 
Te a famous quote from the 1992 film A 

League of Their Own. But there is most 
certainly crying in science, says Isaiah Hankel, 
a former cell biologist turned author and career 
coach. He admits shedding a couple of tears in 
a bathroom cubicle after his graduate adviser 
screamed at him in front of the entire lab — 
all while another principal investigator (PI) 
looked on. “It was the craziest thing,” he says. 


But it did not come entirely out of the blue. 
During his fifth year of study, Hankel had been 
promised an industry job — under the condi- 
tion that he get his PhD first. Unfortunately, his 
PI was not on board with the plan. “He totally 
withdrew his support,’ he says. “I wanted to 
map out exactly what I needed to do for gradu- 
ation, but he would never nail it down” 

Like Hankel, many junior researchers come 
to realize that their relationship with their PI — 
the one person who is most in control of their 
careers — is not working out. “T’ve seen a lot of 


situations where people are having problems 
with their supervisors,” says Sarah Blackford, 
head of education and public affairs for the 
Society of Experimental Biology, headquar- 
tered in London. “People get very emotional, 
and things can escalate.” Blackford, who is 
based at Lancaster University, UK, and advises 
junior researchers throughout Europe, says 
that postdocs and graduate students in broken 
labs must work out the crucial next step. Are 
they going to endure a bad situation? Are they 
going to find a way to mend the relationship? 
Or are they going to jump ship? 

Whatever the decision — endure, repair or 
escape — the conflict will probably become a 
career turning point. Junior researchers who 
run afoul of their PIs may feel stuck, and they 
could end up with one fewer letter of recom- 
mendation than they had originally counted 
on, but that does not mean that their science 
days are over. With a positive attitude, a knowl- 
edge of institutional policies and some objec- 
tive, well-placed allies, it is possible to move 
on — to academia or beyond. 


A CHANGE IN TACK 
Hankel quickly realized that long hours and 
dedication were not going to be enough to 
break the impasse with his PI. “Working extra 
hard is exactly what my PI wanted,” he says. 
“I was getting more data for him. But if they 
aren't going to give you a target to hit, you can't 
keep spinning your wheels.” Instead of working 
harder, Hankel used some of his paid time off, 
giving himself time to make a plan. He started 
attending conferences, which he paid for out of 
pocket. That sort of networking, he says, can 
be especially important in times of conflict. He 
kept daily records of his interactions with his 
PI, and he saved all of the relevant e-mails. 

Most importantly, he set up meetings with 
his department head and several deans, and 
discussed with them his need for a clear path 
to graduation. He also consulted the school’s 
official graduate-school manual, which gave 
him a major source of leverage. Among 
other pronouncements noted in the manual, 
students were expected to graduate within five 
years, and advisers were supposed to actively 
support their students’ progress. Prompted by 
the meetings, his adviser finally told him the 
exact steps that he needed to take to finish his 
dissertation. With an exit plan in place, Hankel 
was able to get his degree about a year after all 
of the trouble started. 

Conflicts with senior scientists can be espe- 
cially bewildering for PhD students, says > 
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THE BEST DEFENCE 


Look before you leap 


Many junior researchers who find 
themselves at odds with their advisers could 
have avoided trouble with a little preliminary 
research. For PhD students, it is helpful to 
find someone who has a history of turning 
trainees into scientists, says career adviser 
Karen Kelsky in Eugene, Oregon. 

“Be a good detective. Check with other 
graduate students and postdocs, and look 
at the track record,” she says. The statistics 
will tell the story — many ‘bad’ advisers 
have never guided a doctoral student 
through the point at which he or she actually 
earned a degree. Of course, some principal 
investigators are too new to have much 
history. In those cases, Kelsky says, students 
should check with prospective advisers 
to make sure that they are committed to 
helping students to earn their degrees. 

Postdocs too often take a scattershot 
approach to finding a lab, says Sofie 
Kleppner, an assistant dean in the office of 


> Karen Kelsky, a science job coach in Eugene, 
Oregon, and author of The Professor is In: The 
Essential Guide to Turning your PhD into a Job 
(Three Rivers Press, 2015). PhD students do 
not always have the interpersonal experience to 
handle rocky relationships, she says, and they 
are often unprepared for the rigid hierarchy of 
academia (see ‘Look before you leap’). “There 
are some aggressive advisers who like the power 
and just want to see a person get destroyed,” she 
says. Instead of letting a student defend a thesis 
and receive a PhD, they ask for one more rewrite 
or one more experiment, not because the work 
is crucial, but to remind the student who is 
really in command, she says. “That’s probably 
the most common story I hear,’ she says. 

In many cases, students can get their freedom 
by putting their head down and meeting every 
request, even if it seems wrong or unhelpful. 
“That’s what ended up happening to me,’ says 
Kelsky, who has a PhD in cultural anthropology. 
“Trevised my dissertation by taking out every- 
thing my adviser hated and putting in every- 
thing she liked.” As they approach the finish 
line, she says, students should think less about 
their literary legacy and more about making 
their PI happy. “A lot of graduate students are 
obsessed with their dissertations, but the fact 
is that nobody is going to read them. They 
shouldn't get so worked up.’ 


SEARCH FOR ALLIES 

One cognitive scientist, who asked not to be 
named, received her PhD from a prestigious 
university on the US West Coast. She says 
that her relationship with her PI fell apart in 
the fourth year of a five-year programme, a 


postdoctoral affairs at Stanford University 
in California. “Some of them will spam the 
entire university looking for a position,” she 
says. “They spam me, and it’s been a long 
time since I’ve had a lab.” 

Instead, they should conduct a much 
more focused search for a lab that is 
compatible with their personality, rather 
than just their scientific interests. She 
recommends that postdocs give a talk to 
the principal investigator and members of 
a prospective lab, creating an important 
opportunity for both sides to look for a 
good fit. In addition, they should set up an 
in-person chat with the adviser — and have 
lunch or dinner with other people in the lab. 
This is the chance to ask a question that 
could prevent a lot of future trouble: what is 
the worst thing about working in this lab? 

If the complaints run far beyond the 
normal scientific grumblings, it is better to 
keep looking. C.W. 


particularly vulnerable time in her education. 
A combination of misdeeds, misunderstand- 
ings and hurt feelings left her wondering 
whether she should abandon the programme 
and start again. Among other questionable 
behaviours, her adviser seethed when she 
did some work with a rival lab during her 
adviser’s sabbatical. When her adviser gave 
one of her projects to another student, she felt 
the relationship was irretrievably damaged. 
But instead of quitting her PhD programme, 
she had coffee with a faculty member who 
helped her to look at the big picture. “She said 


I shouldn't throw away four years of work” The 
same faculty member stepped up to become 
the co-chair of the student’s committee, a 
position from which she could ensure that the 
degree process would be fair and unbiased. 
“She made sure I wasn't retaliated against,” says 
the cognitive scientist, who is now a tenure- 
track assistant professor at a US university. 

For postdoctoral researchers, conflicts with 
PIs can cause a lot of soul-searching and career 
angst, says Sofie Kleppner, an assistant dean 
in the postdoctoral-affairs office at Stanford 
University in California. “It's a huge issue if 
youre in a lab and you feel like it’s the wrong 
lab for you,’ she says. In her experience, post- 
docs often feel as if they and their advisers are 
not on the same page. “One of the biggest prob- 
lems is mismatched expectations,” she says. “A 
postdoc might want to be independent, but a 
PI might be the type who likes to check in. That 
can cause a lot of frustration”” 

In some cases, simple misunderstandings 
can cause a lot of tension. “A postdoc might tell 
me that they don’t want to go into academia, 
but they’re afraid to tell their PI” Kleppner 
says. “And then the PI will say that he’s wor- 
ried because the postdoc doesn’t seem cut out 
for academia.” The upside of simple misunder- 
standings, she says, is that they often have an 
equally simple solution: talking about it. 


THE ART OF CONVERSATION 

As professional scientists, postdocs need to 
take a business-like approach to conflicts 
with their PIs, Blackford says. That means 
communication — and a lot of it. “You have 
to talk about the situation without getting 
personal,” she says. “Set up a meeting with a 
proper agenda.” Blackford adds that not all PIs 
are especially approachable or easy to talk to. 
If one-on-one conversations do not completely 
solve the problem, she recommends finding an 
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Science-career coach Karen Kelsky helps PhD students to navigate the job world. 
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impartial faculty member who is willing to 
offer confidential advice. 

In some cases, Blackford says, discussing 
the situation with an objective ally can help 
disgruntled junior researchers to understand 
the true source of their discontent. “Some 
people can't even put a finger on what's 
gone wrong,’ she says. “They just don't feel 
respected, and then they havea crisis of con- 
fidence. It’s helpful to talk with someone who 
can tease out what youre saying.” 

Postdocs should develop on-campus 
allies who can serve as sounding boards and 
counselors. “I tell people to identify their 
peer support and mentors early,’ Kleppner 
says. “You need someone who can advocate 
for you if something isn’t working out.” 
Adding another person to the conversa- 
tion can be a quick way to find compromise 
and clarity, she says. “It’s basic “Conflict 
Resolution 101’” 

Not all conflicts can be resolved — some 
postdocs eventually decide to leave a lab for 
good. “These are high-powered people who 
don’t want to admit failure,” Kleppner says. 
“But it’s OK to admit it” When it is time to 
leave, professionalism is more important 
than ever. She recommends explaining the 
decision to a Plin clear, dispassionate terms 
— the same tone that is needed when talk- 
ing to other PIs about a possible job. Natu- 
rally, they will want to know why the last job 
did not work out, but they don’t want to be 
dragged into the drama. A postdoc who can 
clearly communicate why the last lab was 
not an ideal fit — without making any per- 
sonal attacks on his or her former PI — will 
have a good chance of moving on. “You're 
not going to ruin your reputation as long as 
you don't ruin anyone else’s,” says Kleppner. 

Hankel managed to leave academia with 
his reputation — and his degree — intact. 
As a career consultant, he now encourages 
other scientists to stand up for themselves 
even when the hierarchy is tipped against 
them. He notes that some scientists end up 
spending so many years doing their PhD and 
multiple postdocs that they barely have time 
to establish their careers before retirement. 
“Advisers hold the keys to people’ lives,” he 
says — which means that it is important to 
resolve disputes as quickly as possible and 
avoid spending too much time ina lab that 
will not promote a junior researcher's pro- 
gress. When a PJ is not being supportive, 
Hankel says, early-career researchers have to 
prioritize their professional interests — even 
if that means hurt feelings, bruised egos and 
achange of venue. “It’s always appropriate to 
have self-respect; he says. m 


Chris Woolston is a freelance writer in 
Billings, Montana. 


A forced lab move can be a hassle. Find out how 
to handle it seamlessly in an upcoming issue of 
Nature Careers. 


TURNING POINT 
Martin Jine 


Structural biologist Martin Jinek helped to 
launch the genome-modification craze that 

is upending biological research. Now running 
his own laboratory at the University of Zurich 
in Switzerland, Jinek describes how research is 
changing as CRISPR — a gene-editing tool with 
the potential to cheaply alter plants, animals 
and even human embryos — takes hold. 


Did you set out to work on CRISPR after 
completing graduate school? 

No. When I started as a postdoc in Jennifer 
Doudna’s group at the University of California, 
Berkeley, in 2007, we knew practically nothing 
about CRISPR, which stands for ‘clustered reg- 
ularly interspersed palindromic repeats: The 
first paper describing it as an adaptive immune 
system in bacteria came out early that year 
(R. Barrangou et al. Science 315, 1709-1712; 
2007). Although Doudna was one of the first 
to explore CRISPR, my original project was on 
the molecular mechanisms of microRNA. But 
the CRISPR field became more interesting, so 
I collaborated with some group members and 
finally began my own project working on Cas9, 
an enzyme that cuts DNA. 


When did it become clear that CRISPR was a 
game changer? 

We were interested at first because it looked 
similar to RNA interference, in which RNA 
molecules inhibit the expression of genes. 
But the molecular machinery was intriguingly 
different. The wider implications — and its 
potential utility in genome research — came 
only after we learned that it cuts double- 
stranded DNA and is programmable, which 
made it even more interesting to work on. 


What is most surprising about this technology? 
How quickly it has developed. Within six 
months of publishing a paper showing that 
CRISPR can be programmed (M. Jinek et al. Sci- 
ence 337, 816-821; 2012), three labs — includ- 
ing ours — were using it as a genome-editing 
tool. Within 12 months, researchers were apply- 
ing it to many cell types and organisms. 


How is CRISPR shaping your research agenda? 
My goal is to understand how the system 
actually works. My resources are not unlim- 
ited, so I focus on what I do well — structural 
biology. Five of the ten people in my lab, which 
began in 2013, are aiming to gain a better 
structural understanding of the DNA-cutting 
mechanisms in CRISPR systems so that we can 
engineer the system to be more efficient and 
versatile. The CRISPR technology is finding 


applications in basic-research labs, as well as in 
biotechnology and molecular-medicine labs, 
to potentially cure genetic disease or engineer 
organisms to make biofuels. I’m already using 
it to address other research questions. 


What did you take from your experience as a 
graduate student in a new lab? 

I was the third PhD student in Elena Conti’s first 
laboratory, at the European Molecular Biology 
Laboratory in Heidelberg, Germany. She was 
a fantastic mentor, and being in her lab at an 
early phase of her career has shaped my own 
lab. She was a tough boss, but she taught me 
how to approacha scientific problem to find the 
right questions, and how to do good science to 
answer those questions. 


Has the public reaction to CRISPR had an 
impact on your work? 

On some level, we anticipated it would be big. 
We just didn’t know how big. The wider societal 
and potential ethical issues associated with the 
use of CRISPR, especially those that relate to 
human-genome modification, have generated 
a lot of attention. The negative side of working 
in the CRISPR field is that it is so competitive, it 
leaves little time for anything else. = 


INTERVIEW BY VIRGINIA GEWIN 


This interview has been edited for length and clarity. 


CORRECTION 

The Careers feature ‘Mind Wide Open’ 
(Nature 525, 147-148; 2015) stated 

that BEST had offered career training to 
about 10,000 graduate students and 600 
postdocs since its launch. In fact, at least 
4,000 postdocs have benefited. 
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WADING INTO WATER 


BY TODD HONEYCUTT 


eall come from the sea. 
That’s what the science 
books tell us, and what I 


think of as I walk the beaches near my 
home, listening to the ocean waves 
sing their songs against the earth; 
watching the gulls struggle with 
each other for scraps of food; 
collecting odd debris and shells 
pushed up by the waves. 

What I think of as I remem- 
ber my daughter. 

Our last conversation — our 
last face-to-face conversation — 
occurred almost two years ago. 
Pearl had called, said she wanted 
to see me, had something impor- 
tant to say. I asked her to come to 
my shore house, because the city per- 
plexes me more than it did when I was 
younger, and the tone of her voice made 
me think that it would be better on my turf 
than hers. 

We weren't estranged, Pearl and I. After 
her mother’s and my divorce, and her flow- 
ering into her own as an adult, we had just, 
as they say, grown apart. A natural progres- 
sion of our lives, I guess, as children stand on 
their own, find their own paths. But I wished 
things had been different. 

Still do. 

Pearl came the following weekend, a 
Saturday morning in September. We bought 
coffees and walked to the boardwalk, found 
a bench facing the ocean. People and 
umbrellas dotted the sand between us and 
the sea. No cloud in that tranquil sky tem- 
pered the brightness of the Sun. 

We talked of pleasantries, memories of 
beach vacations past. Then Pearl cut to the 
core of the issue. 

“Tm going to be uploaded.” 

“Uploaded?” I said. 

Thad heard of it, of course, but hadnt paid 
any attention. Didn't concern me. Not much 
in the news did. One of the advantages of 
growing old — nothing seems newsworthy 
any more. 

Almost nothing. 

“A friend of a friend has me on a list. It’s 
beyond the experimental stage now. It’s safe.” 

“But you lose...” 

“Tlose this body, and I gain so much more. 
It’s the new frontier, Dad” 

As if it were the Wild West or the Moon 
base. 


On the shores of memory. 


What can a father say to the choices his 
children make? How many times had my 
pleas had no effect? Or worse, cornered her 
to become more firmly entrenched? 

When my words had ebbed, Pearl filled the 
space with information about the procedure. I 
couldn’ hear her words, or what I could hear, 
I couldn't make sense of. I know more now. 
How the brain is put in a small vat, bathed in 
salts and chemicals and solutions. Wired so 
that its consciousness is free to roam worlds 
both virtual and real. Entire civilizations rise 
and fall, fantastic landscapes more strange 
than any I can imagine have people living, 
working, achieving, actualizing. Perhaps it is 
the new frontier, but it’s hard for me to under- 
stand a world without the taste of food and 
drink, the feel of sand and water. 

Yes, I know more about it now. 

“Honey, it seems so permanent, so...” 

“Tl still be around” Pearl touched my arm. 
“Tl still be me. Just... ina different form.” 

I finished my coffee, held the cup against 
my leg to keep the wind from tossing it 
along the boardwalk. I looked at Pearl, at the 
woman she had become. Strong, independ- 

ent. Her stubborn- 
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overwhelming curiosity and political astute- 
ness into a career transforming boutique 
biotech companies into global players. 
I was proud of her. I just wished I could 
have shared her excitement. 
“When?” 
“Not sure, but soon.” 

“What about your mother?” 

“T haven't told her.” 

“She'll be heartbroken.’ Easier 
to say than that I was heart- 
broken. 

“Some day,’ she said, “maybe 
you both will join me” 
Some day, her mother would. 

“Let’s go swimming,’ I said. 

“Swimming?” 

I pointed to the ocean. 

“T didn't bring a suit” 

“TI know a store down the road”” 
I pointed behind us. 
“Dad, really” 
“Please.” 
She acquiesced. 

We waded into the water, my daughter 
and me, for what would be the last time. 
It wasn't Pearl at age two, screaming at the 
monstrosity of the ocean, its vastness. It 
wasn't the joy of Pearl at nine, excited at 
each wave, her whole body giggling as she 
fought and swam and dived. It wasn’t the 
angst of Pearl at 15, the constant churn- 
ing of, and chattering with, friends. It was 
Pearl as an adult. The water took us and 
allowed us to be together in a way we hadn't 
in years. We floated and body surfed, smiles 
on our faces as the waves crashed over us 
and threw us into the sand, the crunch of 
shells under our feet, the briny taste of the 
water filling our mouths, and in between, 
our small talk of city life and beach life and 
our shared memories of family and each 
other. Throughout it all, the sounds of the 
ocean rising and falling, the weight of my 
heart, rising and falling. 

The papers are signed, my name ona list. 
I focus on the tastes and touches and sounds 
I encounter, as if they'll be the last Pll know. 
And each day, I sit on the boardwalk and 
wonder, as the waves rise and fall, whether 
I'll feel the same as our last day in the ocean 
when I next see my daughter. = 


Todd Honeycutt is a public-health 
researcher in New Jersey who enjoys 
thinking about alternatives (and 
alternatives to alternatives). His stories have 
appeared in Fiction Vortex. 


ILLUSTRATION BY JACEY 


